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TECHNIQUES FOR THE STUDY OF GROUP 
STRUCTURE AND BEHAVIOR: 
II. EMPIRICAL STUDIES OF THE EFFECTS OF STRUCTURE 
IN SMALL GROUPS! 


MURRAY GLANZER? anp ROBERT GLASER 
American Institute for Research and University of Pittsburgh 


An earlier paper (Glanzer & Glaser, 
1959) reviewed techniques for analyz- 
ing the structure of groups that had 
been permitted to form their own 
pattern of interaction. This paper 
reviews laboratory studies in which 
experimenters imposed different 
structures on groups and measured 
the effect of the structures on per- 
formance. 

The laboratory studies focus on 
communication structure. A com- 
munication structure is a set of posi- 
tions with specified communication 
channels. Between any two posi- 
tions, there may be a two-way chan- 
nel, a one-way channel, or none at 
all. A channel is essentially the prob- 
ability that a message can pass in a 
given direction between two posi- 
tions. It may be defined more gen- 
erally as the probability, pab, that a 


1 Prepared under Contract Nonr 2551(00), 
between the Office of Naval Research, Psycho- 
logical Sciences Division, Personnel and 
Training Branch and the American Institute 
for Research, as part of a research project on 
team training and performance. The authors 
wish to thank Alex Bavelas, Harold Guetzkow, 
Robert L. Hall, John T. Lanzetta, Seymour 
Rosenberg, Marvin E. Shaw, and Gerald 
Shure, who read, and commented on, earlier 
drafts of the paper. 

2 Now at Walter Reed Army Institute of 
Research, 


message can get from Position a to 
Position 6. This is not the probabil- 
ity that a will try to send to b. It is, 
rather, the probability of his getting 
a message through if he tries to send 
one. In most of the structures 
studied, the channels are symmetric 
i.e., Pos=Pea and the channels are 
either available or not, i.e., Pas=0 
or 1. 

The studies are grouped in the 
following sections: The Initial Work, 
Variations and Further Analysis of 
the Basic Design, Testing the Limits 
of the Basic Design, Mathematical 
Analysis, Emphasis on Distribution 
of Functions in the Simulated Team, 
and Emphasis on Feedback and 
Learning. Tables summarize the 
main findings for the studies re- 
viewed, often including more details 
than those covered in the text. The 
tables introduce a number of neces- 
sary simplifications. When an in- 
vestigator employed several closely 
related measures, e.g., group morale, 
job satisfaction, status evaluation, 
findings on only one are included. 
Findings not presented in a form 
that permits evaluation are omitted. 
Findings on the effect of trials or 
learning, a statistically significant 
variable in almost all types of groups, 
are omitted unless especially rele- 


1 


2 MURRAY GLANZER AND ROBERT GLASER 


vant. In order to permit comparison 
between studies in both the tables 
and the text, the same terms will be 
used throughout for a set of related 
measures, even when this departs 
from the investigator's usage. For 
example, morale will refer to a vari- 
ety of measures concerned with satis- 
faction in the experimental situation. 
For the same reason, although dif- 
ferent names are used in various 
studies for the same network, one 
name will be used throughout this 
review. 


THE INITIAL WORK 


The area of communication struc- 
ture was opened up 12 years ago by 
Bavelas (1948) with a discussion of 
mathematical aspects of group struc- 
ture. The paper is Lewinian in tone, 
using the terminology of boundary, 
region, etc. The Lewinian boundary, 
however, is translated into the link 
or channel. This translation is of 
major importance for all the work 
that follows. Bavelas then builds up 
a set of assumptions and definitions 
concerning a collection of cells. He 
defines cell boundary, region, open 
cell, closed cell, region boundary, 
chain, chain length, structure, cell 
distance, cell-region distance, etc., 
and considers the factors that cause 
each measure to vary, deriving 
theorems concerning the following: 
the limits of the values for the various 
distances and other measures, the 
relation between the distance meas- 
ures and the spread of a change of 
state in the structure, and, charac- 
teristics of pathways within the struc- 
ture. Bavelas then shows how the 
various distances change as a func- 
tion of structure types (e.g., organi- 
zations with varying degrees of hori- 
zontal coordination) and as a func- 
tion of an increase in the number of 
levels in the organization. He also 


discusses the role of special positions 
such as liaison positions and possible 
applications of his approach. 

The many provocative points 
raised in the paper were not directly 
followed by experimental work. Ex- 
perimental work was set off by a 
second, much simpler paper (Bave- 
las, 1950), which differs markedly 
from the first. The Lewinian tone 
has disappeared. For example, re- 
gions within structures are not men- 
tioned. Bavelas now discusses a few 
simpler concepts which readily gen- 
erate experimental situations. Com- 
plex concepts such as inner and 
outer regions and chains of connect- 
ing cells do not appear again in the 
work in this area. The only concepts 
that survive from the first paper are 
those of links and distances. The 
focus of the discussion changes, more- 
over, from the larger in situ group, 
e.g., an industrial organization, to 
the small laboratory group. 

In the second paper, Bavelas in- 
troduces the communication net- 
works which were to become stand- 
ard experimental arrangements. The 
channels of these networks are all 
two-way channels: a channel from a 
to b is also a channel from b to a. He 
also introduces the index of relative 
centrality to describe the structures. 
The index of the relative centrality 
(of Position x) is the ratio of the sum 
of the minimal distances of all posi- 
tions to all others over the sum of the 
minimal distances of Position x to 
all others, or 


C(x) = 4. — [1] 


where dy is the minimal distance be- 
tween x and y. Many of the inves- 
tigations of group structure pub- 
lished after this paper focus on this 
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measure. In subsequent studies 
C(x) is also summed over all posi- 
tions, x, in the network to give net 
centrality. 

The main question Bavelas now 
asks is the following: Is it possible 
that 


among several communication patterns, all 
logically adequate for the successful comple- 
tion of a specified task, one gives significantly 
better performance than another (p. 726)? 


In answer to this question, he de- 
scribes experimental results obtained 
by S. L. Smith (unpublished report) 
and Leavitt (1951) who use an ex- 
perimental arrangement that is the 
model for the majority of the subse- 
quent studies, 

Five subjects were each given a list 
of symbols. Their task was to dis- 
cover which symbol they all had in 
common. The physical setting was 
arranged so that some group mem- 
bers could send messages to each 
other, other group members could 
not. Smith's and Leavitt’s subjects 
sat around a table partitioned into 
five Sections with slots, some of 
which were open to allow notes to 
pass between sections. The pattern 
of open slots determined the com- 
munication pattern or structure. 
The subjects were free to use the 
open communication channels in any 
way they wished. They were.not told 
the structure of the network. 

The group's task required two 
main steps: distribution of individual 
information so that some or all mem- 
bers had all the necessary informa- 
tion, and determination of the com- 
mon symbol. The task was com- 
pleted when all subjects gave the 
answer. Smith imposed two com- 
munication structures: Circle and 
Chain (see Figure 1) and finds that 
structure affects group performance. 
The Chain is more efficient than the 
Circle. The performance ascribed to 


individuals is related to their posi- 
tions. The central positions are most 
frequently seen as leaders. Table 1 
indicates the type of data analysis 
carried out in the network studies. 
The two main independent variables 
are the patterns as units (Circle 
versus Chain) and the positions with- 
in a given pattern (a, b, c, d, and e.) 

Leavitt (1951) used the same phys- 
ical arrangement and problem as 
Smith did, but with four structures: 
Circle, Chain, Wheel, and Y (see 
Figure 1). His main positive findings 
are that the Wheel, Y, Chain, and 
Circle (most centralized to least 
centralized) rank in descending order 
(best to least) with respect to the fol- 
lowing: (a) speed of development of 
organization for problem handling 
(the Wheel, Y, and Chain were, more- 
over, stable once they developed 
their organization. The Circle was 
inconsistent, i.e., problem solving 
procedure never became fixed); (b) 
agreement on who the group leaders 
were; and (c) satisfaction with the 
group. The ordering on these char- 
acteristics correlates perfectly with 
the ordering of the values of the net 
centrality index, 2,C(x). 

During the course of 15 trials, all 
the structures showed learning, re- 
ducing the time to complete trials. 
The networks did not, however, 
differ clearly from each other in 
speed or in learning rate. Leavitt as- 
serts that the Circle used more mes- 
sages and made more errors than the 
other networks. The interpretation 
of the data, however, is unclear since 
the analyses are based on a selection 
of the data, e.g., number of messages 
on successful trials. 

Leavitt, in analyzing the effects of 
position within a network (see Table 
2) finds that the most central posi- 
tion sends the most messages and the 
least central, the fewest. Subjects at 
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CIRCLE 


5.0 


zc (X) = 25.0 


CHAIN 
@+4.0 
(b) 57 
(Cc) 6.7 
O 5.7 
(e) 4.0 


=C(X)= 26.1 
x 


WHEEL 
4.6 (b) (d)46 
8.0 
(C) 
46 (0) (e) 46 
SC(X)= 26.4 
Y 


=C(X)= 26.2 
x 


Fic. 1. Five-man networks used by Smith and Leavitt with the relative +» 


centrality index of each position and net centrality, 


the central position, moreover, en- 
joy their jobs more than those at 
peripheral positions. The relation of 
centrality to number of messages is 
to be expected, since the central posi- 
tions had to serve as relays for mes- 
sages from the peripheral members. 
Concerning the relation between posi- 


ECE: 


T 
tion and morale, Leavitt (1951) of- 
fers the following explanation: 
In our culture, in which needs for autonomy, 
recognition, and achievement are strong, it is 
to be expected that positions which limit in- 


dependence of action (peripheral positions) 
would be unsatisfying (p. 48). 


The dependent variables of the 


TABLE 1 
SUMMARY OF SmiTH’s DATA FROM BAVELAS (1950) 


A Average Frequency of occurrence of recognized 
papier as incorrect leader at position 
Patterns tota comple- 
Errors tions a | b c d é 
Circle 14.0 5.0 1 | 2 3 2 3 
Chain 7.0 1.5 0 | 1 14 3 0 
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Smith and the Leavitt studies are the 
major concern of the subsequent net- 
work studies. The variables fall into 
four main classes: (a) efficiency— 
number of errors, correct comple- 
tions, speed of solution, number 
of messages; (b) leadership—posi- 
tions named as leader, agreement 
about leader; (c) morale—rating of 
group, rating of self; and (4) organi- 
zation—consistency, type. These de- 
pendent variables and the two main 
independent variables, group struc- 
tures and individual position within 
group structures, form the basic 
framework of the network studies. 


VARIATIONS AND FURTHER ANALYSIS 
OF THE Basic DESIGN 


The work of Bavelas, Smith, and 
Leavitt proliferated into an abun- 
dance of network studies. The first 
of these was a study by Heise and 
Miller (1951), introducing the fol- 
lowing variations in the original pro- 
cedures: (a) Communication took 
place over an intercom system. The 
subjects could, therefore, listen or 
speak simultaneously to as many of 
the other subjects as the network 
permitted. (b) Communication con- 
tent was highly restricted. The sub- 
jects could only relay the words on a 
given list. (c) The communication 
network included one-way as well as 
two-way channels. The five three- 
man structures used are presented in 
Figure 2. (d) Intensity of noise was 
varied over the networks. 


TABLE 2 


SUMMARY or Leavitt's (1951) FINDINGS FOR 
STRUCTURALLY Distinct POSITIONS 


Mean | Mean 
Pattern Position ee à oe 
sages sent); ment 

Circle a, b,c, d,&e 83.8 65.6 
Chain | a&e 26.2 34.5 
b&d 171.3 76.2 

c 82.4 78.0 

Wheel | a, b, d, & e 28.1 31.2 
c 102.8 97.0 

y a&b 25.9 47.5 
i 79.8 95.0 

d 63.8 71.0 

e 25.6 31.0 


Using a task in which the subjects 
had to reconstruct a master list of 
words on the basis of incomplete 
lists, Heise and Miller find that: (a) 
As the signal-to-noise ratio in the 
intercom channels was lowered, the 
number of words spoken, errors, and 
the time required to complete the 
task increased for all networks. (b) 
With increased noise, the differences 
between networks became more pro- 
nounced. Generally, inefficiency of 
performance, measured by either the 
number of words spoken or the time 
required to finish the task, increased 
from Pattern 1 to 5 (in Figure 2). A 
second task in which the subjects had 
to reconstruct a 25-word sentence 
based on parts given to each of them, 


AAL 


t À 


3 


4 


Fic. 2. Three-man networks used by Heise and Miller. 
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gave similar results, except that Net- 
works 2 and 3 were somewhat more 
efficient than Network 1 at the high 
noise levels. When, however, the sub- 
jects were given anagram problems 
in which communication between the 
subjects was not necessary, the re- 
sults were as follows: intense noise 
decreased the number of words 
spoken; there was no systematic dif- 
ference between the efficiency of the 
various nets. 

Aside from its introduction of a 
greater variety of channel arrange- 
ments, the main contribution of the 
study is probably that it demon- 
strates that no network is best in all 
situations. The efficiency of a struc- 
ture depends on the characteristics 
of the task. Thus, in one of the first 
network studies, the complex inter- 
actions that will mar the apparent 
simplicity of the early findings ap- 
pear. 

Guetzkow and Simon (1955) in- 
troduced the distinction between two 
classes of behavior in the network: 
direct problem solving behavior, such 
as relaying information and asking 
questions; and organizational be- 
havior, such as assigning of roles 
and functions to team members. 
They hypothesize that communica- 
tion restrictions affect only the abil- 
ity of the group to organize; once the 
group is organized, however, the dif- 
ferent structures are equally efficient 
in solving the problems. To test their 
hypothesis, they used three five-man 
networks: Circle, Wheel (see Figure 
1), and All-Channel (see Figure 4). 
Under their variant of the network 
situation, a group member could send 
only coded problem information dur- 
ing trials, but could send any kind of 
message during the intertrial periods. 

On the basis of the characteristics 
of the networks, Guetzkow and 
Simon predict that the Wheel should 


be highest in efficiency, the All- 
Channel intermediate, and the Circle 
lowest. 

The Wheel groups would have the least diffi- 
culty, for they have no channels to eliminate, 
no relays to establish, and already have one 
person occupying a dominant position in the 
net. The All-Channel groups would have the 
next grade of difficulty, since the elimination 
of excess channels and the evolution of one 
person as solution-former are both required, 
yet relays need not be established. The Circle 
groups should have the most difficulty, for 
they need both to establish relays and to 
evolve an asymmetrical arrangement among 
the positions. They also must do some elim- 
inating of unneeded channels, although this 
last requirement is minimal (p. 240). 


Their findings on speed of problem 
solution (which also agree with 
Leavitt’s contention concerning the 
Wheel and the Circle) bear out this 
prediction. 

They cite the following as evidence 
that the structures affect organiza- 
tional efficiency: The interaction 
patterns were most stable (same 
channels consistently used) in the 
Wheel and least stable in the All- 
Channel; the greatest degree of dif- 
ferentiation of function is found in 
the Wheel, the least in the Circle. 
They show, furthermore, that if only 
the stable groups of each network are 
compared, then there are no longer 
differences in the speed in problem 
solution. They cite that finding as 
evidence that the communication re- 
striction does not affect the problem 
solving directly. 

Guetzkow and Dill (1957) follow 
up this study with an investigation of 
what happens during the trial periods, 
in which communication was limited 
to exchange of coded information, 
and during the intertrial “‘organiz- 
ing” periods. They reanalyze the 
Guetzkow-Simon data with respect 
to two factors—‘‘local learning” (see 
Christie, Luce, & Macy, 1952, below) 
which occurs during the trials, and 
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“planning mechanism" which func- 
tions during the intertrial period— 
and conclude that All-Channel shows 
the most planning activity while 
Wheel shows the least, presumably 
because its organization is dictated 
by the communication net. They 
furthermore note that the Circle 
network is handicapped in organiz- 
ing itself during the intertrial period 
by the network restrictions, whereas 
the All-Channel structure does not 
seem to have this difficulty. 

In order to explore this point, 
Guetzkow and Dill obtained new 
data by running groups of subjects 
under an alternating structure con- 
dition. During the task trials, the 
groups were run as Circles. During 
the intertrial periods, all communica- 
tion restrictions were removed by 
opening the barred channels, giving 
an All-Channel net. These new ex- 
perimental groups are called Circle- 
All-Channel. Guetzkow and Dill 
(1957) hypothesize that 
task performance in a restricted net will be 
equal to that in an unrestricted net, if the 
restrictions are removed during the intertrial 


period so that a relay system may be organ- 
ized (p. 191). 


An analysis of task trial times failed 
to support the hypothesis. Circle- 
All-Channel groups do not differ in 
performance time from the Circle 
groups in the earlier experiment. 
All-Channel groups were, moreover, 
significantly faster than Circle—All- 
Channel groups. The main contribu- 
tion of the two studies above is their 
suggestion concerning the ways in 
which communication structure im- 
pedes the group’s attempts to or- 
ganize itself for its work. 

Goldberg (1955) brings to the net- 
work study a new task, the unstruc- 
tured group decision task, and a new 
dependent variable, influence (or, 
more precisely, ‘‘influenceability’’). 


He hypothesizes that in group deci- 
sions, central positions in a network 
would be influenced less than periph- 
eral positions. He placed subjects 
in the five-man Wheel, Y, and Chain 
and showed them a card bearing a 
number of dots. The subjects then 
communicated with each other and 
settled on an estimate of the number 
of dots. Influence, measured by the 
amount that a subject changed his 
initial estimate during the experi- 
mental session, is found to be nega- 
tively related to the centrality of the 
position only for the Y network. He 
finds, however, a positive relation be- 
tween the centrality of a position and 
the number of leader nominations. 

Trow (1957) develops a point 
made by Leavitt (1951, p. 49) into 
the hypothesis that centrality pro- 
duces high morale and status not 
just because centrality implies 
greater access to communication 
channels, but because greater access 
to channels gives autonomy—ability 
to make independent decisions. Trow 
argues that though centrality and 
autonomy are usually correlated, 
they can be separated experimentally 
and that when they are separated, 
autonomy will be found to be the 
effective variable. He accomplished 
this separation by placing his sub- 
jects in apparent three-person chains 
and passing prepared notes to them 
to create the illusion of a group. 
Trow varied autonomy by giving 
some subjects a code book needed in 
planning the group’s task and in- 
forming other subjects that some- 
one else in the group had the code. 
He also gave the subjects a ques- 
tionnaire to measure their need for 
autonomy. 

The major findings are the follow- 
ing; autonomy produces a higher 
level of job satisfaction than does 
dependence; the effect of centrality 


8 MURRAY GLANZER AND ROBERT GLASER 
TABLE 3 
Synopsis OF INITIAL AND FOLLOW-UP STUDIES 
Investigator Task Network indepen teas Dependent Findings* 
Bavelas Determining Chain, Circle Network accuracy — N—a: Ch>Cc 
(Smith) common sym- | (5-man) Position Centrality | leader nomination PCln: + 
(1950) boi 
Leayitt Determining Chain, Circle, Network __ | speed Ns: 0? 
(1951) common sym- Wheel, Position Centrality | accuracy |, Na: ¥>Cc 
bol (š5-man) leader nomination | N—nm: Ch, W!, Y<Cc 
morale PCHln: + 
number of mes- 
sages 
Heise and Reconstruction | Chain, one-way Network speed 
Miller (1951) | of word lists, and two-way Noise accuracy 
sentences, channel Circles | Task x number of words 
anagrams (3-man) Network XNoise 4 
XTask Inter- Ns—nw: + 
action NXNsXT—s, a, nw: + 
Guetzkow Determining All-Channel, Network speed N—>s: WI>A-Cl>Cc 
and Simon common sym- Circle, Wheel organizational N—o st: WI>Cc>A-Cl 
(1955) bol (5-man) stability 
message content 
Guetzkow Determining All-Channel, Network speed ComR-—s: 0 
and Dill common sym- Circle Communication message content 
(1957) bol (5-man) Restriction dur- 
ing Intertrial 
Organizing Pe- 
riod; Circle vs. 
Circle—All- 
Channel 
Goldberg Group decision Chain, Wheel, Y Network “Gnfluenceability” | PC—infl: 0 
(1955) a number of | (5-man) Position Centrality | leader nomination | PC—In: + 
lots 
Trow (1957) | Modified com- simulated Chain Position Autonomy| morale PA—>m: + 
mon symbol (3-man) Position Centrality| status PA—st: 0 
problem Need for Au- PC—m: 0 
tonomy PC—st: + 


a The abbreviations in the Findings column of this and the following synoptic tables are derived from the independent 


pani and the dependent variable. They read as follows: 


>s: +=Network (independent variable) has an effect on speed (dependent variable). 
PA—m: 0=Position Autonomy does not have an effect on morale. 
If the independent variable is at least an ordinal measure, then the symbol + takes on added meaning, signifying the direc- 


tion of the relationship. In these cases: 


Ns—nw: +=Noise level is positively related to the number of words transmitted, 


Ns—s: — =Noise level is negatively related to speed. 


Tf the nara variable is a nominal measure, then the findings are abbreviated as follows: 
Ns: WI>A-Cl>Cc =Network afiects speed, with Wheel faster than All-Channel which is faster than Circle. 


Inequalities in such findings are always given with the superior groups on the left. Thus, 
N—a: ¥>Cc=Network affects accuracy, with Y better than Circle, 
but Nonm: WI, Y, Ch<Cc=Network affects number of messages, with Wheel, Y, and Chain better (requiring fewer 


messages) than Circle. 


b The interpretation of the finding does not agree with the investigator’s. 


upon satisfaction is not significant. 
The relation holds primarily for the 
high-need subjects. Trow concludes 
that ‘autonomy may be considered 
as mediating the observed relation- 
ship [found by Leavitt] between cen- 
trality and satisfaction” (p. 208). 
Predictions concerning a parallel ef- 
fect of autonomy on self-ascribed 
status were not supported. Status 
was, however, affected by centrality. 

The studies summarized in this 
section exemplify the major devel- 
opments of the original theme: addi- 


tion of new variables, e.g., noise, and 
analysis of the structural variables 
into psychological components. A 
synopsis of these studies and the 
initial network studies is presented in 
Table 3. 


TESTING THE LIMITS OF THE BASIC 
DESIGN 


Shaw has systematically worked 
the area opened up by Bavelas, ex- 
tending the investigations to include 
such variables as amount and dis- 
tribution of information, problem 
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WHEEL 


CIRCLE 


AA 


SLASH 


KITE 


ALL - CHANNEL 


Fic. 3. Four-man networks used by Shaw. 


complexity, and type of leadership. 
He has also suggested additional con- 
cepts: independence of positions 
(rather than centrality) and satura- 
tion. 

Shaw (1954a) extended the net- 
work investigation to four-man 
groups (see Figure 3). The names he 
assigns to his networks raise an in- 
teresting question. Could not the 
four-man “Wheel” also be called a 
es The question has importance 
in comparing results for networks 
that differ in size. There is no em- 
pirical or rational basis for matching 
results from a four-man and five-man 
“Wheel.” The only thing clear is 
that the number of distinct patterns 
decreases as the number of group 
members decreases. Therefore, al- 
though Chain, Wheel, and Y are dis- 
tinct patterns for five-man groups, 
when the number of members is re- 
duced by eliminating a peripheral 
member, only two of these three pat- 
terns remain: four-man Chain and 
Wheel-Y. If the number of members 
is reduced again, the two remaining 
networks coalesce into the simple 
three-man Chain. The difficulty 
caused by ignoring this characteristic 
will be pointed out later. 


Shaw finds that the centrality of 
position is related, as in the Leavitt 
study, to number of messages sent, 
satisfaction, and frequency of nomi- 
nation as leader. He proposes, how- 
ever, an alternative to centrality, the 
related concept of independence, J, 
and constructs a measure of it. Shaw 
then plots mean number of messages, 
morale, recognition of leadership 
against J for his own and Leavitt's 
data. I appears to give plots that are 
more nearly monotonic than does 
centrality. The functions are, how- 
ever, not only complicated, but also 
differ in form for presumably compa- 
rable Shaw and Levitt data. For ex- 
ample, the equation relating morale 
to I is logarithmic for Leavitt’s data 
and linear for Shaw’s. The need for a 
concept like independence in explain- 
ing behavior within the networks had 
been expressed by Leavitt (1951) and 
has received empirical support by 
Trow (1957). Shaw’s J, however, is 
an awkward and complex combina- 
tion of variables. Since he gives no 
statistical evaluation of the improve- 
ment of T's fit of the data over the fit 
yielded by centrality, it is difficult to 
judge whether Is smoother curves 
compensate for its greater complex- 
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ity. (The comparability of Leavitt’s 
and of Shaw’s data is of concern. 
Shaw told his subjects what the net- 
work structure was. Leavitt and the 
investigators following his procedures 
did not. Other aspects of the pre- 
sumed comparability of data are dis- 
cussed later on—e.g., Shaw, 1954b). 

The data Shaw considered above 
were drawn from a separately pub- 
lished study (1954c) aimed at testing 
the hypothesis that the distribution 
of information affects the behavior of 
networks. Since the more central 
positions usually have more informa- 
tion than the other team members 
during the major part of a trial, the 
effects of centrality and amount of in- 
formation are ordinarily confounded. 
Shaw, therefore, varies the amounts 
of information initially given to posi- 
tions within three networks. In this 
way, he separates to some extent the 
effects of the two variables. 

He uses the three four-man com- 
munication patterns depicted in Fig- 
ure 3: Circle, Wheel (or Y), and 
Slash. The groups solved arithmetic 
problems for which each team mem- 
ber held some of the necessary infor- 
mation. In some teams, all members 
had the same amount of information. 
In other teams, the information was 
unequally distributed with the posi- 
tions marked a in Figure 3 receiving 
more information than the others, 

He finds that central positions and 
the positions with the larger amount 
of initial information tended to solve 
the problems more quickly. There 
were no significant effects of networks 
or distribution of information condi- 
tions on network speed. Here, and in 
the following studies, Shaw centers 
much of his data analysis on the 
higher order interactions, e.g., net- 
work with information distribution 
with trials. Since his hypotheses and 
his conclusions are not at this level, 


attention will be given primarily to 
main effects. 

The results on number of messages 
as related to network (Circle versus 
Wheel) and as related to position 
centrality agree with Leavitt’s find- 
ings for five-man groups. In general, 
the Wheel required fewer items to 
reach a solution than the Slash or 
Circle and central positions sent more 
messages than peripheral positions. 
Shaw also finds that positions given 
more information sent more items 
than did the same positions under 
equal distribution of information. 

What is the meaning of a relation 
between the number of messages—a 
measure used by Leavitt, Shaw, and 
the investigators who follow them— 
and position differences? Since the 
different positions have to send differ- 
ent minimum numbers of messages 
to complete a trial, it is not very en- 
lightening to note that differences 
appear. In a five-man chain with 
each man holding one item of infor- 
mation, the end men have to send 
only one message in order to assure 
complete distribution of their infor- 
mation. The central man has to 
transmit five items. It is necessary to 
relate the number of messages sent 
by a position to the minimum for the 
position. Otherwise, it is as if an 
experimenter reported significant dif- 
ferences in the number of responses 
by two experimental groups when 
one group of subjects is requested to 
name two items each, the other re- 
quested to name only one, 

Shaw does not find that differences 
in network affect the number of 
errors, although unequal distribution 
of information lowers the number of 
errors significantly. On the other 
hand, leadership, measured in terms 
of preference in a sociometric ques- 
tionnaire, was related to centrality 
but not to information distribution, 


—— a M 
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Similarly, group morale measures 
and individual morale or job satis- 
faction ratings were, as in the Leavitt 
study, related to centrality. They 
were not, however, related to infor- 
mation distribution. 

Gilchrist, Shaw, and Walker (1954) 
explored the effect of distribution of 
information further by giving addi- 
tional information not only to pe- 
ripheral, but also to central positions 
in the four-man wheel. Their three 
experimental conditions consisted of 
an equal distribution of information, 
an unequal distribution to the pe- 
riphery (one peripheral subject re- 
ceiving more information than the 
others), and an unequal distribution 
to the center (the center subject 
receiving more information). Dis- 
tribution of information did not have 
any significant effect on overall group 
performance as seen in time scores, 
error scores, sociometric choices, num- 
ber of message units, and leadership 
emergence. It did have an effect on 
behavior at individual positions. In- 
creasing the initial information, in 
general, decreased the time scores 
and increased the number of mes- 
sages transmitted, job satisfaction, 
and position status rating. The in- 
vestigators’ expectations concerning 
the order of the time scores are not 
met. The central position with addi- 
tional information has a higher time 
score than the peripheral positions 
with additional information and also 
a higher time score than the central 
position under equal information 
distribution. Primarily to explain 
the latter result, they introduce the 
concept of saturation, defined as the 
input and output requirements which 
are imposed on positions within a 
group structure. The concept, a 
promising one which suggests that 
communication requirements may 
counteract the effects of centrality, 


is explored in a subsequent investiga- 
tion (Shaw, 1955). 

Shaw (1956) investigated the effect 
of another aspect of information dis- 
tribution in communication networks: 
random versus systematic distribu- 
tion. In solving an arithmetic prob- 
lem consisting of four distinct steps, 
each member of a four-man group 
may have all the information items 
necessary to complete one of the 
steps. This is called systematic dis- 
tribution. A random distribution is 
one in which each of the information 
items is assigned at random; a 
member, therefore, usually has to go 
to several sources (other members) 
for the information to complete one 
step of the problem. This type of ex- 
perimental operation brings the net- 
work study close to the situations 
used by Lanzetta and Roby in their 
manipulation of ‘dispersion of infor- 
mation sources” (see below). Shaw 
predicts that systematic distribution 
will increase efficiency and job satis- 
faction and that the increase will be 
greater if the subjects are informed 
about the system of distribution and 
if the network permits freedom of 
action—e.g., All-Channel as com- 
pared with Wheel. Analysis of time 
to solution in the Wheel and All- 
Channel networks, in part, supports 
Shaw’s predictions. Knowledge of 
distribution is not, however, signifi- 
cant as a main effect or in interaction 
with distribution of information. 
Networks, also, do not differ signifi- 
cantly on the time measures. 

Another follow-up (Shaw, 1954b) 
of the distribution of information 
study by Shaw (1954c) attempts to 
reconcile an apparent discrepancy 
between his and” Leavitt’s (1951) 
study. Leavitt found some evidence 
that the five-man Wheel network 
solves problems faster than the five- 
man Circle. Shaw’s four-man Circles 
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are somewhat faster than his four- 
man Wheels. The speed difference is, 
however, not statistically significant. 
Shaw suggests that the difference 
stems from the difference in the com- 
plexity of the problems used. This is 
essentially Heise and Miller’s (1951) 
point that different structures will be 
best for different tasks. Shaw’s 
(1954b) main hypothesis is 


that a communication net in which all Ss are 
in equal positions (the circle) will require less 
time to solve relatively complex problems but 
more time to solve relatively simple problems 
than will a communication net in which one S 
is placed in a central position (the wheel) 
(p. 211). 


To test this hypothesis, Shaw gave 
simple (common letter) and complex 
(arithmetic) problems to the three- 
man Wheel and Circle (Networks 3 
and 1, respectively, in Figure 2). 
Two points should be noted con- 
cerning his structures: they are three- 
man groups not five-man groups, as 
in the Leavitt study, and not four- 
man groups, as in the other Shaw 
study; the question raised earlier 
concerning the naming of networks 
may be raised again. Why is the 
third pattern in Figure 2 called a 
“Wheel” rather than a ‘Chain’’? 
With these points in mind, it seems 
unlikely that differences in the results 
of a study of five-man groups and a 
study of four-man groups can be re- 
solved by astudy of three-man groups. 
Resolution is especially unlikely since 
the Chain, which, according to Leavitt, 
tends to be slower than the Circle, 
and the Wheel, which tends to be 
faster than the Circle, reduce to a 
single network in the three-man 
group. Shaw identifies this network 
with the fast Wheel. It could just as 
well be identified with the slow Chain, 
In any case, Shaw's main hypothesis 
is that the interaction of problem 
complexity with network has an effect 
on solution time. The results tend to 


support his prediction, but are not 
statistically significant. Analysis of 
the number of items communicated 
and errors does not add any support 
to the hypothesis. 

The problem complexity with net 
centrality interaction is pursued in 
one more study in which Shaw (1958). 
manipulates complexity by the addi- 
tion of irrelevant information to 
arithmetic problems given to the 
four-man All-Channel and Wheel. 
The evidence is again unclear. A 
significant effect of the interaction is 
found on the number of messages but 
not on time to solution. 

Shaw (1955) has better luck with a 
study of the effect of saturation and 
independence. He elaborates these 
concepts and through them arrives 
at the variables of the classic Lewin, 
Lippitt, and White study (1939): 
autocratic versus democratic leader- 
ship. He assumes that the leader’s 
style affects both saturation and in- 
dependence: “autocratic” leaders de- 
crease both the independence and 
saturation of the followers and ‘‘dem- 
ocratic”’ leaders increase both. Inde- 
pendence is assumed to improve 
performance and morale, with a 
greater effect on morale. Saturation 
is assumed to lower performance and 
morale, with a greater effect on per- 
formance. From these assumptions, 
Shaw derives the following predic- 
tions: autocratic leaders will promote 
better performance than democratic 
leaders, autocratic leaders will cause 
poorer morale, and differences be- 
tween central and peripheral posi- 
tions will be accentuated by auto- 
cratic leadership. 

The two leadership conditions were 
used with the four-man Wheel, Kite, 
and All-Channel (see Figure 3) solv- 
ing arithmetic problems. The sub- 
ject at Position b in the network was 
assigned the role of leader and was 
instructed to be either “autocratic” 
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or “democratic” in his handling of 
directions and suggestions. As Shaw 
predicted, the autocratic groups are 
higher in efficiency and lower in 
morale. Although analysis of data 
for individual positions confirms pre- 
vious findings that the central posi- 
tions solve the problems more quickly, 
send more messages, and have higher 
morale, it does not confirm Shaw’s 
prediction that autocratic leadership 
will increase the difference on these 
measures between central and pe- 
ripheral positions. It might be added 
that in the Lewin, Lippitt, and White 
study, autocratic and democratic 
leadership styles generated an analog 
of the Wheel and the All-Channel, 
respectively. Under autocratic lead- 
ership, for example, most of the 
group’scommunications weredirected 
at the leader. Shaw’s study, there- 
fore, may be viewed as involving two 
types of manipulation of communica- 
tion structure: direct manipulation 
through the elimination of channels 
and indirect manipulation through 
the effects of the leader’s style. 

In the preceding studies by Shaw 
and his associates, the groups worked 
two to four problems. The effect of 
prolonged experience was investi- 
gated by Shaw and Rothschild (1956). 
Groups in the Wheel, Slash, and All- 
Channel structures (see Figure 3) 
solved two arithmetic problems a day 
for 10 days. The usual analyses are 
made of time scores, number of mes- 
sage units transmitted, and satis- 
faction ratings. The results, to some 
extent, agree with the results of previ- 
ous studies (Shaw, 1954c, 1955). 

The merging, seen in the study on 
leadership style, of Shaw’s network 
investigations with the more con- 
ventional social psychological tradi- 
tion continues in a study by Shaw, 
Rothschild, and Strickland (1957) in 
which they use unstructured group 
discussion tasks. Each member of the 


group starts with all the information 
required for a decision. The group 
members have to interact only to 
reach an agreement on the solution. 
The networks differ significantly in 
the time required to reach a decision. 
The Wheel requires the longest time 
and the All-Channel, the shortest. 
The finding agrees, to some extent, 
with Shaw and Rothschild’s findings 
on the same networks solving arith- 
metic problems. Two other experi- 
ments reported in this article investi- 
gate the effect of the position within 
a network upon the ability of an in- 
dividual to maintain nonconforming 
opinion. These experiments are 
similar to Goldberg's experiment 
(1955). The results, in general, indi- 
cate that the amount of change that 
a subject is willing to make is a func- 
tion of the amount of support and 
opposition he faces rather than any 
position characteristic. The data on 
the relation between position cen- 
trality and tendency to be influenced 
do not, however, permit clear inter- 
pretation. Goldberg, it may be 
recalled, finds no overall relation be- 
tween centrality and tendency to be 
influenced. 

In summary, Shaw and his associ- 
ates have exhaustively worked the 
area opened by Bavelas and Leavitt. 
They have also introduced new con- 
cepts, e.g., independence and satura- 
tion, which are worth further exami- 
nation. Their work forms a major 
body of data concerning the effect of 
structure on group behavior, An 
overall summary of these findings is 
presented in Table 4. A glance at the 
variables employed in successive 
studies indicates that the area has 
been worked not only exhaustively, 
but to exhaustion. 

After a promising start, the ap- 
proach has led to many conflicting 
results that resist any neat order. 
Perhaps more significant as a symp- 
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Fic. 4, Five-man networks used by Christie, Luce, and Macy. Arrows indicate one- 
(The “Triple Wheel” is called “Wheel” by Christie et al.) 


tom of morbidity is the lack of new 
hypotheses. The lack is seen in the 
regression to nonstructural inde- 
pendent variables: leadership style, 
conformity pressure. 


MATHEMATICAL ANALYSIS 


Christie, Luce, and Macy and their 
associates have carried out an inten- 
sive program of investigation of be- 
havior in group networks. They have 
emphasized “pure” structural char- 
acteristics and have subjected their 
data to detailed mathematical and 
logical analysis. The full range of 
their approaches to network behavior 
is set forth in two reports (Christie 
et al., 1952; Luce, Macy, Christie, & 
Hay, 1953). In the first report, they 
discuss the various aspects of net- 


ve 


ALPHA 


way Channels, 


work behavior extensively, and ana- 
lyze data obtained in a series of ex- 
perimental studies. 

One of the studies was concerned 
with the effects of learning on per- 
formance in the networks in Figure 4, 
Christie (1954) later published results 
for the five-man Circle, Chain, All- 
Channel, and Pinwheel.’ The groups 
solved a series of 25 list reconstitu- 
tion problems like those in the Heise- 
Miller (1951) study. An “action 
quantization” restriction was im- 


posed in order to simplify the data 
for analysis: 


The subjects were required to send single- 


* This section draws both from results pre- 
sented in the larger report (Christie et al., 


1952) and from the separate report by Christie 
(1954), 


TABLE 4 15 
Synopsis oF Saaw STUDIES 


Independent 
variable 


Investigator 


Task Network Findings 


DI-s; 0 


Shaw (1954c) Distribution of In- 


Arithmetic Circle, Slash, 


problems Wheel formation; Equal DI—a: Unq> 
(4-man) vs. Unequal DiI-—nm: 0 Eq 
Network DI—ln: 0 
Position Centrality DI—mr: 0 
Position Information N-s: 0 
(High vs. Low In- Na: 0 


formation at a 
Given Position) 


PIs: + 
PI— 
PI-ln: 0 
PI-—mr: 0 
Gilchrist, Arithmetic Wheel Distribution of In- | speed DI—s: 0 
Shaw, and problems (4-man) formation: see accuracy Dia: 0 
Walker vs. Unequal Pe- | number of messages | DI—nm: 0 
(1954) ripheral vs. Un- leader nomination PCs: + 
equal Central morale PCa: 0 
Position Centrality PC-nam: + 
Position Information PColn: + 
PC-mr: + 
PIs: + 
PI-ln: 0 
PI>mr: + 
Shaw (1956) | Arithmetic All-Channel, | Distribution of In- speed DI-s: Sys>Rdm 
problems Wheel formation: Sys- accuracy —nm: 0 
(4-man) tematic vs. Ran- | number of messages | DI—mr: Sys>Rdm 
dom morale KID-s: 0 
Knowledge of In- KID—nm: 0 
formation Distri- KID 0 
>s: 
N-nm: W1<A-Cl 
Problem Difficulty N-nr: A-C1> W1 
Shaw (1954b) Common letter | Circle, Wheel | Problem Complexity | speed CompXN—s: 0 
(simple) and | (3-man) XNetwork Inter- | accuracy CompXN =i 
arithmetic action number of messages | Comp XN—nm: 0 
(complex) Network 
problems Problem Complexity 
Shaw (1958) | Arithmetic All-Channel, | Network speed N-s; A-Cl>W1 
problems Wheel Irrelevant Problem | number of messages | N—nm: WI<A-Cl 
(4-man) Information morale N-mr; A-C1> W1 
Networks XIrrele- Is: = 
vant Information H—>nm: 0 
Interaction II—>mr: 0 
NXII—>s: 0 
NXII—>nm: + 
NXU—>mr: 0 
Shaw (1955) | Arithmetic All-Channel, | Leadership Style: speed LS—s: Auto > Demo 
problems Kite, Wheel PEE pE accuracy LS—a: Auto > Demo 
(4-man) Democratic number of messages | LS—nm: Auto < Demo 
Network morale LS~mr: Demo> Auto 
Position Centrality N-s: 0 
pos Loader Nom X W1<Kt<A-Cl 
ade: le nm: a 
Interaction AA Nmr: A-Cl>Kt>W1 
PCXLS—>s: 0 
PCXLS~nm: 0 
PCXLS—mr: 0 
Shaw and Arithmetic All-Channel, | Trials: Learning speed 
Rothschild problems Slash, Network ntaber of messages 
(1956) ae Position Centrality | morale _ an Frese S 
“man, organizatio} ~ p 
Patruetire WI<A-CI<SI 
Shaw, Roth- | Group decisions All-Channel, | Network speed N-s: A-Cl>SI>WI 
schild, and about “human | Slash, ” | Position Centrality | number of messages | Nnm: 
Strickland relations” Wheel morale N-nr: 0 
(1957) problems (4-man) PCs: 0 
PCnm: + 
PC—mr: 0 
Estimation of Position Centrality | “influenceability” PCinfl: 0 
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address messages at prescribed times, to in- 
clude in their messages all problem informa- 
tion known to them at the time, and to write 
nothing other than problem information. . . . 
Thus, each message-sending action, herein- 
after called an act, was a simultaneous sending 
by the whole group (p. 189). 


In their data analyses, these in- 
vestigators use the minimum number 
of acts in which it is possible for a 
network to complete its task as base- 
lines. The minimum possible number 
of acts is an important consideration 
which has been neglected in other 
network studies. All networks in 
Figure 4 have minima of three acts 
except for the Chain with a minimum 
of five. Chain-(X) and Circle-(X) 
are topologically the same as Chain 
and Circle, respectively. They differ 
from Chain and Circle only in the 
physical arrangement of the positions. 
The investigators computed the dis- 
tribution of the number of acts re- 
quired for completion on the assump- 
tion that the group members dis- 
tribute their information at random 
over their channels. It is not surpris- 
ing that comparison of the theoretical 
with the observed number of solu- 
tions show that the subjects do better 
than chance from the start. Clear 
differences in learning efficiency be- 

tween networks are also demon- 
strated. 

Christie (1954) summarizes the 
results for four of the networks as 
follows: 

Groups using the totally connected net- 
work [All-Channel] do somewhat better than 
random but show a negligible amount of 
learning. The groups in chain learn well, but 
their performance is good only with respect 
to the chain minimum of five acts per trial. 
The high minimum for this network makes 
its absolute performance poor in comparison 
to each of the other networks. The pinwheel 
network performs somewhat better than 
random, and its random distribution is a 
favorable one [i.e., the mode is close to the 
minimum]. Like totally connected it learns 
little, so that its final distribution is prac- 
tically the same as that in totally connected. 
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Circle is the one very different case; it 
achieves the best distribution [i.e., it fre- 
quently completes the task in the minimum 
possible] in the final block as a result of excel- 
lent learning (p. 193). 


Christie, Luce, and Macy introduce 
the concept of “locally rational” be- 
havior to explain the differences be- 
tween networks on the basis of be- 
havior at the individual positions. 
Locally rational behavior is the 
tendency to send successive messages 
to different stations so as to maximize 
the amount of new information re- 
ceived by neighboring positions. “. . . 
the behavior called for depends only 
on each subject’s attending to condi- 
tions immediate to his own position, 
i.e., to whom he has sent and from 
whom he has received” (Christie, 
1954, p. 195). The investigators used 
Monte Carlo runs on a computer in 
order to obtain the theoretical dis- 
tribution of the number of acts to 
completion under both the equiprob- 
able random behavior and the locally 
rational behavior model for each net- 
work. In a final summary of their 
work, Christie, Luce, and Macy 
(1956) show that in successive trials, 
network performance (the distribu- 
tion of number of acts) approach 
more and more closely that of the 
locally rational model. It is not clear, 
however, whether this generalizatio 
holds for the All-Channel network 

In analyzing the learning of su 
jects within the various networks, 
they also pay attention to the im- 
portance of differences in the proba 
bility of various initial act pattern 
For example, in the Circle a mint 
mum solution is possible only wit 
an initial pattern of two mutu 
interchanges and one unreciprocat 
message (e.g., a with b, c with d, am 
etoa). In the Chain, any initial pat 
tern of acts can result in a mini 
solution. 

They find another stimulus f 
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experimental work and analysis in 
information theory concepts. In a 
study on coding noise presented in 
the main technical report (Christie 
et al., 1952) and published separately 
by Macy, Christie, and Luce (1953), 
they examine the effects of ambiguity 
of stimuli, interpreted as semantic 
noise. (Heise and Miller, 1951, stud- 
ied the effect of acoustic noise in the 
communication channel.) They used 
the five-man Circle, Chain, Wheel, 
and Pinwheel (see Figures 1 and 4). 
Another variable, labeled “feedback,” 
is introduced by giving some Wheel 
groups additional information con- 
cerning errors at the end of each 
trial. 

The groups’ task was to discover 
the color of a marble that all held in 
common. Fifteen problems with 
marbles of clearly identifiable color 
were followed by 15 trials with am- 
biguous stimuli—marbles of mixed, 
indistinct color. The authors state 
that the data on speed and number of 
messages agree with Leavitt's results. 
Their main findings on learning to 
handle the ambiguous stimuli do not 
give a simple picture. The Circle 
reduces its errors markedly over suc- 
cessive trials. The other structures 
do not. The explanation of these 
results may lie in Shaw's (1954b) 
hypothesis, as yet undemonstrated, 
that centralized structures are handi- 
capped on complex problems. Intro- 
duction of additional feedback in the 
wheel network seems to improve 
performance somewhat. 

They pursue their informational 
analysis with an estimate of “condi- 
tional receiver entropy” based on the 
number of different marbles called 
by the same name. They point out 
that the method by which the effi- 
cient networks reduce ambiguity 
seems to be by an increase in re- 
dundancy (computed in terms of the 
number of extra names given to a 


marble). The behavior of the net- 
works is further analyzed qualita- 
tively in terms of error feedback (the 
opportunity of the members to ob- 
tain the same information from at 
least two different sources) and the 
opportunity to correct errors (the 
presence of symmetric, i.e., two-way 
channels). The Pinwheel lacks the 
latter, while the Wheel, and to some 
extent the Chain (in its end mem- 
bers), lacks the former. The presence 
of both, the investigators argue, is 
necessary for optimal performance. 
Christie et al. (1952) try to carry 
out detailed analysis and derivation 
of every type of data generated by 
the original network studies. They 
try to derive the distribution of group 
latency data on the basis of assump- 
tions concerning the individual la- 
tency distribution. They analyze the 
determinants of leader designation, 
using an index based on 
the relative frequency of use of a channel (on 
an equiprobable sending basis) and the 
mean input of the sending end of the channel 
as an estimate of the sending end’s value to 
the receiving end (p. 179). 


Their index fits the obtained values 
for two networks rather well. 

They devote a similarly detailed 
analysis to the determinants of job 
satisfaction, following their general 
approach of using the individual 
position as a basis for prediction. An 
index called input potential which 
considers the input density for each 
position is found to be more highly 
correlated with job satisfaction than 
peripherality. The formula for input 
potential gives some idea of the level 
of the analysis. 


1 e1 1 
I)=— sin yo thes 2] 
=s faita | 
where I is mean input. 
In a subsequent report (Luce et als, 
1953), they carry out the same types 
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TABLE 5 
Synopsis or CHRISTIE, Luce, Macy STUDIES 


Investigator Network gies Seri pee Findings 
Christie (1954) Reconstruction | All-Channel, Chain, | Network amount of N-al: Cc, Ch>Pw, A-Cl 
also in of number Circle, Pinwheel | Trials: Learning learning 
Christie, Luce, list (5-man) number acts 
and Macy (1952) to solution 
Miey; Christie, | Determining Chain, Circle Network accuracy N—=a: Cc>WI, Ch, Pw 
and Luce (1953) common am- Pinwheel, Wheel | Additional Feed- 
also in biguous mar- | (5-man) back in Wheel 
Christie, Luce, ble 


and Macy (1952) 


of analyses and also examine the addi- 
tional problem of the effects of change 
of network structure, with subjects 
trained in one network and tested in 
another. The later work is even more 
complex, involving a multiplicity of 
approaches, and does not lend itself 
readily tosummary. Since their work 
is primarily concerned with analytic 
techniques, Table 5, which includes 
only those empirical results that are 
comparable to the results in the other 
summary tables, does not do justice 
to the full range of their work. The 
general philosophy and major accom- 
plishments of their research effort is 
summarized by Christie et al. (1956). 

The main tendency of the analysis 
by Christie, Luce, and Macy is away 
from functionsinvolving overall meas- 
ures of the group, e.g., network cen- 
trality, and toward the derivation of 
the behavior of the group from that 
of the individual positions. Their 
efforts may be considered to parallel 
Shaw’s. Just as he carried the em- 
pirical work in the area as far as it 
can go, so do they carry the mathe- 
matical analysis to the limit. In both 
cases, it was desirable to have the 
job done, It seems unlikely now, 
however, that the payoff will be 
commensurate with the energy and 
ingenuity that was invested. This 
could, of course, be only discovered 
by the doing. 

With the efforts of both Shaw and 


his associates and of Christie, Luce, 
and Macy carried as far as they can 
go, a new approach or new definition 
of the field seems necessary. The re- 
maining sections of this paper will 
review some of the attempts at re- 
analyzing or redefining the area. 


RETROSPECT AND PROSPECT 


At this point, it is appropriate to 
look back at the problem as originally 
stated and its expression in experi- 
mental form. 

Two main questions posed by 
Bavelas are the following: What 
effect does the structure of the group 
have upon its efficiency? What effect 
does position in the group have upon 
the subject’s morale and job satis- 
faction? There is no simple answer 
to the first question. The effect of 
structure depends in part on the re- 
quirements of the task (Heise & 
Miller, 1951). Contrary to Leavitt’s 
original generalization (1951), in a 
number of studies the highly cen- 
tralized structures are less efficient 
than other structures (Macy et al., 
1953; Shaw, 1958; Shaw et al., 1957). 
The answer to the second question is 
somewhat clearer. Morale seems to 
be a function of centrality of position. 
The psychological basis for this rela- 
tionship, however, warrants further 
analysis. Explanations have been 
offered in terms of autonomy (Trow, 
1957), independence (Shaw, 1954a), 
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and input potential (Christie et al., 
1952). 

The unclear answers to the first 
question may arise from the peculiar 
experimental situation used to ex- 
press it. The characteristics of the 
original Bavelas-Leavitt situation 
that recommend it are its apparent 
experimental simplicity and rele- 
vance to real-life situations. Does the 
situation actually have these charac- 
teristics? That the situation is not 
simple is evidenced by the introduc- 
tion of techniques to simplify it fur- 
ther, e.g., action quantization. Even 
with the imposition of further restric- 
tions, however, a precise analysis of 
the activity of the groups is unman- 
ageably complex. 

That the situation is far distant 
from most familiar real-life situations 
can be seen by reviewing the special 
characteristics of the laboratory net- 
works. They are the following: 


1. Interdiction of certain channels. This 
is the most obvious of the special characteris- 
tics of the laboratory networks. To some ex- 
tent, this corresponds with conditions in 
natural groups. Some communication chan- 
nels are frequently closed to members of 
groups. For example, a man may not be per- 
mitted to go over the head of his immediate 
supervisors in a work group, or he may be 
unwilling to make certain statements when 
another member is present. 

2. Ignorance concerning other positions. 
This is probably both the effective and the 
really unique aspect of the communication 
restriction. The network members know very 
little about other positions and about behav- 
ior of any except adjacent positions. This is 
a condition that does not hold in small groups. 
The effect of this factor can, of course, be re- 
duced to some extent by changes in the pro- 
cedures of the network studies. In the Guetz- 
kow-Simon (1955) study, for example, this 
may have been done by having intertrial ad- 
ministrative discussions. 

3. Necessity of each member. In almost all 
the network studies, each member is essential, 
because each member holds an essential piece 
of information and each member must present 
a solution to the problem. In some cases, one 
member may have more or less information 


but in almost all the studies the elimination 
of one member prevents success of the group. 
This is not generally true in real-life groups. 


These special characteristics of the 
network studies would make gen- 
eralization difficult even if the findings 
were unequivocal. The applicability 
of the findings of the network studies 
are in question because the character- 
istics of the structures employed in the 
studies are very different from other 
small groups. The following point, 
however, may be argued: If the net- 
work studies have any application, it 
will not be in the small group, but ina 
much larger unit such as an indus- 
trial corporation or an army. Char- 
acteristics analogous to those listed 
above are more clearly present in 
large groups. For example, depart- 
ments of a company may not have 
direct communication channels; they 
often lack information concerning 
distant sections and all departments 
may be necessary for the company to 
function. 

If the laboratory network cannot 
be viewed as a simplification of the 
general small group situation, can it 
be viewed as a laboratory simplifica- 
tion to permit testing of an explicit 
theory about group behavior? The 
answer, unfortunately, is no, At the 
present time, a theory concerning 
behavior in the network does not 
exist. This raises a major point. Per- 
haps the most surprising thing about 
the entire area is that despite the 
highly formal origins of these studies 
(Bavelas, 1948), the organized body 
of theory promised by the approach 
has not yet appeared. 

Perhaps in response to considera- 
tions such as these, two attempts 
have been made to use a somewhat 
different approach to the study of the 
effects of group structure on behavior. 
One of these attempts is by Lanzetta 
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and Roby. Their main aim is to draw 
the experimental situation closer to a 
known type of group—the military 
work team. The other attempt is by 
Rosenberg and Hall. Their main 
objective is to simplify the experi- 
mental situation (two-person situa- 
tions) and rephrase the problem so 
that available theory—learning the- 
ory—can be brought to bear on the 
problem. Both approaches assign 
new definitions to the term structure. 
For Lanzetta and Roby, team struc- 
ture refers to the specialization and 
interrelation of jobs in a team. For 
Rosenberg and Hall, team structure 
refers to the degree to which the 
information that an individual re- 
ceives about his performance is con- 
founded by the performance of an- 
other team member. 


EMPHASIS ON DISTRIBUTION OF 
FUNCTIONS IN THE SIMULATED 
TEAM 


Lanzetta and Roby have directed 
one major attempt to examine, from 
a new viewpoint, the relation of group 
structure to performance. Their 
attempt is embodied in a series of 
studies in which they vary the ways 
that team members depend on each 
other for information. In a situation, 
quite unlike the Bavelas network, 
modeled after military teams, e.g., a 
bomber crew, they gave teams a series 
of very short problems in order to 
approximate a continuously changing 
environment. They vary communica- 
tion structure not by interdicting 
channels as in the network studies, 
but by restricting relevant informa- 
tion or specific functions to a given 
position. Their team is like the All- 
Channel network but with each sub- 
ject working on a separate problem 
and holding some information re- 
quired by other team members. De- 
spite these differences in experimental 


situation and in definition of struc- 
ture, the basic concern remains the 
same: what factors in the organiza- 
tion of a group affect its performance? 
An early study by Lanzetta and 
Roby (1956b) indicates both the de- 
velopment of their experimental situ- 
ation and the type of practical situa- 
tion from which it grew. In this 
study, they investigate the effect of 
two methods of work distribution 
(work structure) under two task load 
conditions on group performance. 
They model the experimental situa- 
tion after an air defense center with 
two work structure conditions. In 
vertical structure, each group member 
had one of three tasks: tracking air- 
craft, identifying aircraft and keeping 
a record of the interceptors’ fuel 
status, or deploying friendly planes. 
In horizontal structure, each group 
member performed all three functions 
for his own targets. Variations of the 
number of airplanes produced two 
different task load conditions. Of the 
main independent variables of the 
study—structural organization, load 
conditions, and their interaction—- 
only load condition has a significant 
effect. The interpretation of this ef- 
fect is, however, complicated by a 
significant interaction with sessions. 
The main outcome of the study was 
a methodological development rather 
than an empirical finding. It led toa 
simpler task with higher reliability 
for use in the subsequent studies. 
This task, modeled after a bomber 
crew’s task, was used in their next 
experimental study (Roby & Lan- 
zetta, 1956a) to demonstrate the 
effect of relaying requirements upon 
group performance. Groups of three 
subjects sat, each in a separate booth 
that contained instrument reading 
displays, pairs of control switches, 
and instructions giving the correct 
switch settings for each possible pair 


—— 
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of instrument readings. The instru- 
ment readings required to set a given 
control could be displayed in the 
booth containing the related control 
or they could be shown in one of the 
other booths. In the latter case, the 
subject receiving the information 
would have to relay it to its eventual 
user over the intercom system con- 
necting the booths. 

Roby and Lanzetta used four com- 
munication structures which differed 
in the degree to which the subjects 
had direct access to the information 
they needed. A significant difference 
in the number of errors appears be- 
tween the communication conditions. 
Analysis indicates that more errors 
are made on a control if its two rele- 
vant items of information had to 
come from two sources rather than 
from one source. The results cannot 
be considered surprising. If a subject 
has to get the necessary information 
from someone else, who is also busy, 
he will not do as well on a highly 
speeded task as a subject who has 
his information immediately avail- 
able. Furthermore, if he has to make 
two separate information requests in 
a brief (15-second) period, he will be 
more likely to fail than if he has to 
make only one request. 

Lanzetta and Roby (1956a) next 
consider the effect of type of input 
presentation on efficiency in two 
communication structures employed 
in the previous study: a high de- 
pendence (or low autonomy) condi- 
tion in which each member had to get 
all of the instrument readings neces- 
sary to operate his controls from 
other team members; and a low de- 
pendence condition in which a mem- 
ber had three out of the four neces- 
sary instrument readings available 
in his booth. They varied two aspects 
of the task input: task load (the time 
interval between successive presenta- 


tions of instrument readings) and the 
predictability of the order of presenta- 
tion to the three booths. They find 
again that high dependence gives 
rise to more errors, especially when 
the information has to be relayed 
from several different sources. For 
both structure conditions, errors in- 
crease as the rate of change of instru- 
ment readings increase, but predicta- 
bility of the order of instrument 
changes has no significant effect. 

These findings are further sup- 
ported in another study in which 
Lanzetta and Roby (1957) investi- 
gate learning and the details of 
communication behavior in their 
team situation. They vary depend- 
ence (relaying requirement), task 
load (speed of presentation of input), 
and operating procedure as deter- 
mined by instructions to ‘‘volunteer”’ 
information or to ‘‘solicit” informa- 
tion. 

In a later study, Roby and Lan- 
zetta (1957) consider the effect of 
“load balancing” or distribution of 
work. They used three structures 
that varied the relation between the 
number of instrument displays and 
the number of control switches for 
which the subject was responsible. 
In Structure I (equal observation 
load) a booth had either one, two, or 
three control switches, but it always 
had two displays. In Structure II 
(unequal load) a booth that had one 
control switch had one display; a 
booth with two control switches had 
two displays; etc. In Structure II] 
(balanced load) a booth with one 
control switch had three displays; 
a booth with two control switches 
had two displays; etc. The experi- 
mental design is quite complicated 
and confounds the load balancing 
and dependence variables. The au- 
thors, however, conclude that “both 
load balancing and autonomy are 
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influential but that the latter is more 
heavily weighted in this task” (p. 
174). 

The major accomplishment of Lan- 
zetta and Roby is their introduction 
of controlled, experimentally manip- 
ulable tasks that capture more of the 
characteristics of real-life teams than 
do the earlier Circles and Wheels. 
They have also theorized extensively 
(Roby, 1957; Roby & Lanzetta, 
1956b, 1958). The real payoff in 
their work will come, however, when 
theory and experimental work merge. 
Their theorizing consists of general 
statements that never arrive at the 
prediction or explanation of specific 
events. Without a theory to generate 
novel and testable predictions, the 
experiments usually establish the ob- 
vious, e.g., if a subject has to check 
with many people before he makes a 
response, he is not likely to complete 
the response in a short time period. 
Although Lanzetta and Roby have 
not completed the merger of theory 


and experiment, they have brought 
them several steps closer together. A 
summary of their experimental find- 
ings is presented in Table 6. 


EMPHASIS ON FEEDBACK 
AND LEARNING 


Rosenberg and Hall have recently 
examined the effects of group struc- 
ture from a different viewpoint than 
Lanzetta and Roby’s. Rosenberg and 
Hall see the composition of informa- 
tion feedback to the individual mem- 
bers as a key aspect of structure. 
They concern themselves, therefore, 
with the relation of structure, defined 
in terms of information feedback, to 
performance. Figure 5 illustrates the 
basic structures they study. S*is the 
stimulus which precedes a response, 
R is the response, and Sf is the feed- 
back stimulus, i.e., the state of affairs 
in which the individual finds himself 
after performing the response. In the 
“direct” feedback condition the Sf 
reflects only the subject’s own per- 


TABLE 6 


nAi 


Lanzetta 


Lanzetta and Roby 
(1956a) 


Independent 
variable 


SYNOPSIS OF LANZETTA AND Rosy STUDIES 


Dependence 


Task Eoad: Input Pres- 


entation Rate 
Predictability of Input 


Lanzetta and Roby 
(1957) 


Roby and Lanzetta | Simulated military crew 


(1957) 


a Corrected for length of trial. 
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CONFOUNDED FEEDBACK 
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OTHER'S FEEDBACK 


Fra. 5. Feedback conditions used by Rosenberg and Hall. 


formance. With “confounded” feed- 
back the response of one subject 
combines with that of another so that 
his ‘feedback is a function of his 
teammate’s performance as well as 
his own. With “other's” feedback 
the subject receives feedback solely 
from someone else's performance. In 
order to investigate the relation of 
these structures to performance, Ros- 
enberg and Hall have carried out a 
series of studies using variations of an 
experimental situation similar to that 
of Sidowski, Wyckoff, and Tabory 
(1956). 

In their first study, Rosenberg and 
Hall (1958) ran two-man groups un- 
der the three structures described 
above. The task was to learn to turn 
a knob a required number of turns. 
The amount of error (S‘ value) was 
displayed to the subject after each 
trial. Under direct feedback, each 
subject had to learn to turn the knob 
four times. Under confounded feed- 
back, the two team members had to 
attain a team average of four turns. 
They could reach this average by 
totaling eight turns distributed in 
any fashion between them. Under 
“other's” feedback, the subject had a 
perfect score displayed only if his 
partner turned the knob four times. 
The design of the study permitted the 
evaluation of both the effects of the 
subject’s own feedback condition 
and his partner’s feedback condition 
(which could be different) upon the 
subject's performance. The depen- 
dent variables were: individual ac- 
curacy, team or average accuracy, 


and role differentiation—a function 
of the absolute difference between 
the response magnitudes of the two 
team members. 

The subjects learn most rapidly 
and to the highest level of proficiency 
under direct feedback. With con- 
founded feedback the subject learns, 
but more slowly and to a lower level 
of proficiency. There is no improve- 
ment in individual accuracy under 
“other's” feedback. The partner's 
feedback condition has no significant 
effect on the subject's accuracy. 
With respect to team product, con- 
founded feedback yields team accu- 
racy (average performance) that is 
at least as good as that obtained with 
direct feedback. “Other's” feedback 
gave clearly inferior team perform- 
ance. In the confounded feedback 
condition, one subject evidently 
learned to make two turns if his part- 
ner persisted in making six turns $0 
that both subjects would have an ay- 
erage of four. Rosenberg and Hall 
label this compensatory difference 
between response magnitudes, role 
differentiation. They find that 
the confounded feedback conditions 
shows more role differentiation than 
the direct feedback. The “other's” 
feedback condition, however, shows 
the greatest amount of all. Rosen- 
berg (1959b) also considered the effect 
of switching subjects from a direct 
feedback situation to other structures. 
After the switch, the three structures 
show the same effects as above. 

Hall (1957), using similar appa- 
ratus, investigated two independent 
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variables: type of pretraining, and 
the relative weights assigned to the 
responses of the team members dur- 
ing confounded feedback. He varied 
pretraining conditions by pretraining 
some subjects under direct feedback 
and others under the same con- 
founded feedback conditions they 
received during later trials. The ex- 
perimenter used two confounded 
feedback weightings—equal and un- 
equal. In equal weighting, he fed 
back the mean of the two members’ 
responses or 
sf=4Ri+34R 
as in the previous experiments. In 
the unequal weighting, he weighted 
the responses of one member three 
times as heavily as the other, i.e.: 
s$ = 4R +iR 

The dependent variables were team 
accuracy and role differentiation. 
The feedback weighting conditions 
do not have any significant effect on 
the dependent variables during either 
pretraining or training. In discussing 
the results, Hall emphasizes the com- 
pensatory behavior that occurs in the 
confounded feedback situation. 

Zink (1957) carried out a further 
study in this series using a more com- 
plex task and a different rule for 
determining feedback. Contrary at 
least to the reviewers’ expectations, 
the results indicate greater role differ- 
entiation for the simple task than for 
the complex task. Rosenberg (1959a), 
later tried to produce role differentia- 
tion in Zink’s complex task by pre- 
training the subjects under direct 
feedback. His hypothesis was that 
the subjects had not reached that 
level of proficiency in Zink’s complex 
task to permit them to adjust to a 
partner’s behavior. He is, however, 
unable to obtain differences in 
role differentation between subjects 
given different amounts of direct 
feedback pretraining. 


In a final set of three experiments, 
Rosenberg (1960) systematically ex- 
plored the effect of various combina- 
tions of feedback weights on team 
performance. He also varied the 
informational content of the feed- 
back by letting some groups know 
only that an error had occurred and 
by informing other groups about both 
the occurrence and the direction of 
the error. On the basis of detailed 
consideration of the effects of feed- 
back upon the response of the sub- 
jects in the various structures, Rosen- 
berg makes predictions concerning 
the development of complementary 
or cooperative behavior. In general, 
he finds that more stable response 
patterns develop as the amount of 
information concerning the direction 
of errors increases. If both subjects 
have a feedback weight of .50 or more 
on their own response, then their 
combined responses tend to stabilize 
at some optimal value, i.e., one in 
which both members receive maxi- 
mum reinforcement. The accuracy 
of the group product is therefore 
maximized. 

With these experiments, this group 
has moved very far from the original 
network studies. In their earlier 
work (Rosenberg & Hall, 1958), 
communication between the team 
members dropped out as an explicit 
independent variable. In the last 
study, amount of reinforcement re- 
ceived replaces group accuracy as the 
dependent variable of primary inter- 
est. The work of Rosenberg and Hall 
has certain basic similarities to the 
work of Lanzetta and Roby. Here 
again the experimenters accomplish 
a very able reduction of the real-life 
team to laboratory proportions. The 
contribution with respect to methods 
is considerable. Here again, however, 
the work generates obvious results. 
The one study (Rosenberg, 1960) 
with novel and systematically related 
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results is one that has moved away 
almost completely from the variables 
of the early studies of group struc- 
ture. The work of Rosenberg and 
Hall is summarized in Table 7. 

It is hoped that as the methods in 
the area are improved, theories which 
can tie together disparate findings 
and generate new predictions will 
develop. Rosenberg and Hall have 
done even more than help prepare 
the methodological groundwork for 
the phase of theorizing that is needed 
now. By reducing social interaction 
to feedback conditions, they have 
prepared the way for an attack with 
the armament of learning theory. 
(This has actually begun in work 
being carried out by Burke, 1959, and 
his associates.) Whether such an at- 
tack can be made without giving up 
the original objective of studying 
group structure remains to be seen. 


SUMMARY AND CONCLUSIONS 


Since the initial stimulus provided 
by Bavelas in 1948 there has been a 
considerable effort spent on the study 
of the effect of structure upon group 


original questions posed were; What 
effect does the structure of the group 
have upon the efficiency of its be- 
havior? What effect does position in 
the group have on morale and job 
satisfaction? There is no clear answer 
to the first question. The answer to 
the second question is that central 
positions in general are more satisfied 
with their tasks than peripheral posi- 
tions. 

Later investigators went beyond 
the first two questions to study other 
variables. Heise and Miller intro- 
duced a task complexity variable, 
the condition of communication inter- 
ference (noise), and one-way chan- 
nels. Guetzkow and his collaborators 
introduced the distinction between 
task behavior and organizational ac- 
tivity. Shaw continued the original 
trend of the experimental work and 
also investigated the effects of various 
types of distribution of information 
and task complexity. Christie, Luce, 
and Macy brought mathematics and 
information theory to bear on the 
communication networks. They pre- 
sented the theory of “locally rational” 


and individual behavior. The main behavior to explain learning in the 
TABLE 7 
Synopsis oF ROSENBERG AND HALL STUDIES 
i Independent Dependent indi 
Investigator Task E Aafa Findings 
Rosenberg and Dial turning Subject’s Feedback Condition: | individual accuracy | SE—ia: D>Cf>0th 
Hall (1958) Direct, Confounded, team accuracy PF-ia: 0 
“Other’s” role differentiation | SE—>ta: D, Cf>0th 
Partner’s Feedback Condition SF—>rd: 0th>Cf>D 
Rosenberg (1959b) | Dial turning Subject’s Feedback Condition | individual accuracy SF>ia: D>Cf>0th 
(Pretraining on Direct PF—>ia: 0 
Feedback) 
Partner’s Feedback Condition 
Hall (1957) Dial turning Subject’s Feedback Condition | team accuracy | SFP—ta: 0 
in Pretraining role differentiation | SFP—rd: 0 
Subject’s Feedback Condition SFT-ta: 0 
in Training (Two Types of SFT—>rd: 0 
Confounding) 
Zink (1957) Display-control | Task Complexity role differentiation | TC—rd: simple >complex 
also in Degree of Tndividual Pre- DIP—rd: 0 
Rosenberg (1959a) training 
(A Be So 
Rosenberg (1960) | Dial turning Subject’s Feedback Condition: | amount of SF—ar: + 
Direct, Various Types of reinforcement Tear: + 
Confounding 


Information on Direction of 
“Error” 
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networks and differences in perform- 
ance between networks. 

Neither the straight empirical work 
nor the mathematically sophisticated 
analyses have approached the goal, 
implicit in Bavelas’ original ques- 
tions, of a rational system for ar- 
ranging groups to maximize effi- 
ciency and satisfaction. The diffi- 
culties in building such a system 
may stem from the peculiar char- 
acteristics of the Bavelas network 
and the absence of a theory to order 
the data it generated. 

In response to these difficulties, 
more recent investigators have re- 
oriented the work on group structure. 
Lanzetta and Roby have redefined 
structure into terms of direct versus 
indirect accessibility of task informa- 
tion and distribution of task infor- 
mation. Under this type of definition 
they have constructed new types of 
groups and tasks. These investi- 
gators also have made some moves 


toward meeting the need for a theory 
in the area. Rosenberg and Hall have 
attempted to rephrase the problem 
and redesign the experimental setting 
so that learning theory can play the 
organizing role. To do this, they 
define structure in terms of the effect 
of one subject’s responses on another 
subject’s feedback (reinforcement) 
and have studied the effect of various 
feedback arrangements on group 
(dyad) and individual behavior. 

At the present time, there is still a 
major need for a system to order the 
data already obtained and to direct 
further work on the effects of group 
structure. The difficulty in construct- 
ing this system may arise from the 
inappropriateness of either the ex- 
perimental situations or the concepts 
that have been used. Attempts have 
been made to remedy both of these 
possible defects. The success of these 
attempts will determine whether this 
review is a prologue or an epitaph. 
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The concept of secondary rein- 
forcement has been extremely useful 
to psychological theorists and experi- 
ments centered around it have 
brought fruitful and important addi- 
tions to our knowledge of behavior. 
Repeatedly, it has been demonstrated 
in animal experiments that stimuli 
which are paired with food or water 
will gain the power to “‘reinforce”’ be- 
havior—either to sustain old re- 
sponses or fixate new ones. However, 
while reinforcement theorists have 
generally made no distinction be- 
tween the functional properties of 
food and water reinforcement and the 
reinforcement provided by the ter- 
mination of noxious stimuli, almost 
all attempts to use the latter type of 
event as the basis for establishing 
secondary reinforcement have pro- 
duced negative results. Yet, for 
example, when electric shock is used 
to motivate and reinforce behavior 
directly, learning is powerful and 
prompt. 

If secondary reinforcement cannot 
be established with the termination 
of a noxious drive, there would seem 
to be little point to using drive reduc- 
tion as the fundamental reinforce- 
ment mechanism in theoretical sys- 
tems which at the same time lean 


1 Part of this material was originally pre- 
sented in an unpublished doctoral disserta- 
tion, University of Illinois, 1958. 

2 Now at Wake Forest College, Winston- 
Salem, North Carolina. 

The writer wishes to express his apprecia- 
tion to O. H. Mowrer for critical discussions 
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Grice, L. I. O’Kelly, and W. F. Crowder for 
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heavily on secondary reinforcement— 
as, for example, Hull (1943), Mowrer 
(1956, 1959), and Miller (1951) have 
done. Indeed, the failure to demon- 
strate this phenomenon could be con- 
strued as presumptive evidence in 
favor of the opposite view: namely, 
the argument that secondary rein- 
forcement is a phenomenon involv- 
ing motivational increments, particu- 
larly those related to the stimulating 
properties of the anticipatory goal re- 
sponse. Spence (1956) and Seward 
(1952) have argued for this position. 

In brief, then, while one would 
hesitate to mention such a will-o-the- 
wisp as a “crucial experiment,” we 
cannot take lightly Mowrer’s (1959) 
suggestion that the fate of drive re- 
duction theory may rest on the suc- 
cessful demonstration of secondary 
reinforcement established in conjuc- 
tion with the reduction of shock or 
some other noxious stimulation. The 
purpose of the present paper is to 
evaluate the evidence on the problem 
with an eye toward finding and/or 
proposing experimental tests of the 
hypothesis that secondary reinforce- 
ment can be so established. 


EXPERIMENTAL EVIDENCE 


For convenience the experimental 
evidence is categorized as follows, ac- 
cording to the method used in testing 
for reinforcement: (a) response acqui- 
sition, including bar pressing, T maze 
learning, head turning, and pushing a 
nose-key; (b) response extinction; (c) 
delay of reinforcement; (d) response 
facilitation; and (e) preference testing 
(other than T maze). 
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Response Acquisition 


Bar pressing. One of the first to 
claim positive results on this problem 
was Barlow (1952). In the two ex- 
perimental groups of interest here, a 
5-second light either came on (a) in 
the last 5 seconds of a 10-second 
shock, or (b) immediately after the 
shock. After a single such pairing 
each rat was tested 20 hours later, 
with total duration of bar pressing as 
the measure of reinforcement. For 
half the animals in each training 
group the light was continuously on 
and could be turned off by pressing 
the bar, while for the other half the 
light was off and could be turned on. 
The animals in Group a showed no 
significant difference in duration of 
bar pressing, though tending to turn 
the light on more than off. The 
animals in Group 6 showed a signifi- 
cant difference in this same direction. 
These results are very weak support- 
ing evidence and give rise to two 
points of interest. First, secondary 
reinforcement was supposed to have 
been established with a single pair- 
ing. This result may be compared 
with those of Bersh (1951), who 
found that 80 light-food pairings did 
not produce a reinforcement effect 
significantly different from zero pair- 
ings. Second, significant results were 
found for the group in which the 
light came on after shock termina- 
tion, but not for the group in which 
the light preceded it. The light in this 
instance could have come before, or 
have been paired with a discriminable 
drop in drive, thus accounting for the 
apparent backward conditioning. 

But, it would appear from other 
sources that this particular sequence 
might bring about backward aversive 
conditioning, not positive condition- 
ing. Razran (1956) concludes from 
his review of the literature that ‘‘with 


shock as the US, backward condition- 
ing seems to be possible only when 
the CS is applied after the shock has 
ceased, and not when it is applied 
during the action of the shock.” 
Mowrer and Aiken (1954) found that 
a signal following immediately upon 
the termination of shock later in- 
hibited a food-reinforced response. It 
is consequently possible that in 
Barlow’s experiment the light was 
aversive and when the subjects 
pressed the bar the onset of the light 
caused them to “freeze” on it, thus 
producing relatively long durations of 
pressing. Perhaps the duration meas- 
ure is not the most satisfactory index 
of reinforcement. 

Littman and Wade (1955) used a 
tail-shock apparatus to pair a light 
with shock termination. In a differ- 
ent apparatus, with light as the rein- 
forcement for bar pressing, the rats 
did not press more than control sub- 
jects. Deutsch (1956; see also Litt- 
man & Wade, 1956) raised several 
questions about this experiment, the 
most pertinent of which concerns the 
use of a different apparatus for test- 
ing than that used for training. 
Direct evidence regarding the po- 
tency of a secondary reinforcer in 
transituational testing is not plenti- 
ful, and it is probably unwarranted 
to assume secondary reinforcement 
should be obtainable in a situation 
very different from that in which the 
subjects are trained, even with ap- 
petitive reinforcement. 

Beck (1958) trained rats to escape 
from grid shock with a lighted T 
maze door and a tone serving as cues. 
After 180 training trials there were 
two 10-minute test periods during 
which the subjects were locked in the 
choice area of the maze with a newly- 
introduced bar. Shock was on con- 
tinuously during testing and escape 
was not permitted. When the bar was 
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pressed by the subjects in the experi- 
mental group the light and tone came 
on for 2.5 seconds. In the first of two 
replications of this experiment there 
was some indication that the light and 
tone were reinforcing bar pressing, 
but the effect was not strong and did 
not appear in the second replication. 
To further complicate the results, 
animals trained to escape the shock 
without any light or tone made as 
many or more bar presses with these 
as “reinforcers” as did the main ex- 
perimental group.* A major difficulty 
in using grid shock is that the animals 
can get ‘‘primary” reinforcement by 
hitting upon any movement or pos- 
ture which reduces the pain. Such 
responses can either compete with 
the test response or facilitate it, 
thereby increasing variability and 
possibly eliminating significant differ- 
ences which might otherwise have 
been obtained. 

T maze learning. The paradigm for 
the T maze experiments is funda- 
mentally the same as Saltzman’s 
(1949) experimental design. After 
rats have been reinforced in a dis- 
tinctive goal box at the end of a 
straight runway, the reinforcing prop- 
erties of the goal box are tested by 
putting it on one arm of a T maze, 
comparing turns to this box by ex- 
perimental and control groups. 

Smith and Buchanan (1954), used 
an amount-of-reinforcement design 
with this paradigm. Omitting the 
various controls, they trained one 
group of rats to run across an electri- 
fied grid to get food in a black goal 
box and across a sponge runway to 


3 In this experiment secondary reinforce- 
ment was not shown using the same procedure 
and animals trained with water reinforcement 
and tested under 23 hours’ water deprivation. 
The possible reasons for this, as well as 
pertinent data, are presented at length in the 
original report. 


get food in a white goal box. A 
second group had the color of the 
goal boxes reversed. The goal box 
associated with both food and shock 
reduction should take on greater 
secondary reinforcing capacity than 
the goal box associated with food 
alone. In a black-white discrimina- 
tion situation, with the black goal 
box of a T maze positive, it was pre- 
dicted that the animals previously 
shocked prior to entering the black 
goal box should make fewer errors 
than the group shocked prior to 
entering the white goal box. The re- 
sults bore out the prediction quite 
well. 

In three later experiments with the 
same basic design, Buchanan (1958) 
found that (a) rats would “increase 
their tendency to approach cues con- 
tiguous with escape from a fear-pro- 
ducing situation, as well as those con- 
tiguous with escape from shock”; 
(b) “the approach tendencies, ac- 
quired by hungry rats during training 
to cues associated with shock reduc- 
tion and hunger reduction, were not 
appreciably affected by changes in 
the drive conditions of hunger and 
fear between training and testing”; 
and (c) “shock reduction and hunger 
reduction were approximately equal 
in their effects on the strength of ac- 
quired tendencies to approach asso- 
ciated cues, and that the drives of 
hunger and shock and/or their re- 
spective incentives combine in some 
fashion in the development of these 
approach tendencies” (p. 362). 

In a similar kind of study, Nefzger 
(1957) trained rats to run across a 
grid into a distinctive end box. He 
hypothesized that as training pro- 
gressed, the animals should show an 
increasing preference for this end box 
if it were on one arm of a T maze. He 
recorded no change of preference 
with repeated testing, however, and 
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was unable to duplicate the Smith 
and Buchanan results when response 
elicitation by the goal box itself was 
controlled. 

The problem of controlling for re- 
sponse elicitation during tests for 
secondary reinforcement is important 
for a number of experimental designs 
which utilize the same response in 
both training and testing.4 By “elici- 
tation” we refer to the capacity of a 
stimulus to evoke or otherwise exert 
discriminative control over a re- 
sponse. The “reinforcing” property 
of a stimulus refers to its capacity to 
be effective in fixating or otherwise to 
prolong responding in some manner. 
The question raised by an experiment 
such as Smith and Buchanan’s is 
whether the black goal box in the T 
maze is eliciting an earlier-learned 
approach response (the goal box 
being visible from the choice point), 
or whether it is reinforcing some new 
turning response. If we are testing 
the hypothesis that reinforcement 
can be demonstrated in such a situa- 
tion we must accept the eliciting in- 
terpretation over the reinforcing in- 
terpretation in case of any doubt. 
Buchanan, in his later experiments, 
did in fact abandon secondary rein- 
forcement as the sole interpretation 
of his results and referred to the 
“eliciting and/or reinforcing” proper- 
ties of the goal box. McGuigan 
(1956) has also discussed this general 
problem in relation to Hull’s treat- 
ment of secondary reinforcement. 

Head turning. Coppock (1950, 
1951) obtained results indicative of 
the establishment of secondary rein- 
forcement, pairing a blinking light 


4 Here we are concerned with the logical 
problem of interpreting experimental results 
in an ambiguous situation. In a later section 
we shall consider in detail the evidence re- 
lated to the so-called ‘discriminative stimu- 
lus hypothesis” of secondary reinforcement. 


with tail-shock termination. In his 
tests, whenever the rats had their 
heads turned 22° to a particular side 
they were continuously reinforced by 
the blinking light. The results were 
weakly positive, however, only for 
those animals which (a) had the light 
following shock termination, and (b) 
were reinforced with the head on the 
initially nonpreferred side. The ef- 
fect thus appears to be very unstable, 
if real at all, and the use of the dura- 
tion measure of head position is open 
to the same ambiguity as in Barlow's 
experiment, i.e., the blinking light 
might be producing fearful “freezing” 
of the head in the position. 

Key-nosing. Crowder (1958) used 
a tail-shock apparatus with rats in 
several different experimental de- 
signs, all of which were characterized 
by very precise control of the shock, 
conducting tests for secondary rein- 
forcement with the shock on, and 
using the pushing of a nose-key 
(similar to a Gerbrands pigeon key) 
in the front of the apparatus as an 
operant response. He found in one 
study that a light repeatedly paired 
with the termination of inescapable 
shock did not later have a significant 
reinforcing effect on the nosing re- 
sponse. 


Response Extinction 


The familiar model for this type of 
design is the comparison of rate of ex- 
tinction with and without the pre- 
sentation of a secondary reinforcer 
following responses during extinction. 
Bugelski’s (1938) experiment is the 
prototype. 

Crowder (1958), with his tail-shock 
and nose-key apparatus, gradually 
increased shock to a maximum in- 
tensity over a 25-second interval. 
The first nosing response after the 
shock reached its peak was immedi- 
ately followed by a 0.5-second pre- 
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sentation of light, then the shock was 
terminated. In both extinction and 
reconditioning the presentation of 
this light as a reinforcer significantly 
increased response rate. However, a 
modification of this procedure in 
which the shock during training came 
on instantaneously to full intensity 
produced completely negative results. 
Crowder’s positive results seem to be 
the best indication of secondary rein- 
forcement among all the studies re- 
viewed, and his technique of gradu- 
ally increasing shock intensity ap- 
pears promising, though possibly 
having little effect other than adapt- 
ing the animals to the pain. 

Murphy, Miller, and Brown (1958) 
studied the extinction of an avoid- 
ance response. During training a 
light followed barrier-jumping re- 
sponses to the shock and the CS. In 
extinction, with only the CS pre- 
sented, it was found that the presen- 
tation of the light after each response 
prolonged extinction very markedly 
and the authors interpreted this to 
mean that secondary reinforcement 
had been demonstrated with pain and 
fear reduction as primary reinforcers. 
On the other hand, we are again faced 
with the awkward problem of back- 
ward conditioning (the light followed 
response and reinforcement during 
training) and the criticisms of Bar- 
low’s experiment hold here, also. It 
seems just as reasonable to argue that 
the light had become aversive during 
training and retarded extinction by 
keeping the general level of fear high 
rather than serving as positive rein- 
forcement. A paper by Seeman and 
Greenberg (1955) is directly relevant 
to this experiment, as well as a brief 
report by Bender (1955). 

Beck’ trained five rats to escape 


5 Unpublished research, University of Illi- 
nois, 1957. 


from shock by running through the 
lighted member of a pair of adjoining 
plexiglass panels. Each animal was 
then put into the apparatus for 15 
minutes with both panels locked and 
dark (no shock). Pushing against one 
of the panels always produced for 0.5 
second the light which had been the 
positive stimulus, but escape was not 
permitted. Pushing against the other 
panel did not produce any illumina- 
tion change. As predicted, (a) the 
subjects pushed the panel on the rein- 
forced side significantly more than on 
the nonreinforced side, and (b) the 
percentage of total responses to the 
reinforced side was significantly 
greater than that of a control group 
trained without a cue stimulus. It is 
still possible, however, that when the 
light came on it was eliciting further 
responses rather than reinforcing a 
certain position habit. 


Delay of Reinforcement 


In a fourth experiment, Crowder 
(1958) used a light to bridge a delay 
between the nosing response and 
shock termination. Shock came on 
immediately at full intensity, then 
when the rat pushed the nose-key 
there was a 2-second onset of light, 
following which the shock was ter- 
minated. Eighty-five trials with this 
procedure did not result in signifi- 
cantly shorter latencies in occurrence 
of the response after shock onset than 
did a 2-second delay of reinforcement 
without the light. Both conditions 
were vastly inferior to immediate 
reinforcement. 


Response Facilitation 


Lee (1951) taught three groups of 
rats to bar press for food, then asso- 
ciated a tone with shock in a tail- 
shock apparatus. In one group the 
tone was associated with the onset of 
shock, with the termination of shock 
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in a second group, and with no shock 
at all in a third. These groups were 
now tested by pairing the tone with 
the previously-learned bar pressing 
response. According to his hypoth- 
esis, the group having tone associated 
with shock termination should press 
the most; the group without tone- 
shock experience an intermediate 
amount; the group with tone asso- 
ciated with shock onset should press 
the least. As it happened, pairing the 
tone with shock termination inhibited 
responding as much as pairing it with 
shock onset. 

Mowrer and Aiken (1954) obtained 
results similar to Lee’s. They paired 
a blinking light with shock onset and 
termination in various temporal se- 
quences and found that after the light 
had been contiguous with shock ter- 
mination (either just before or just 
after) its presentation inhibited bar 
pressing for food reinforcement. 
Mowrer and Aiken suggest that the 
light did not facilitate bar pressing 
because the animals were not afraid 
at the time of testing, i.e., the rele- 
vant motivating condition was not 
operative at the time of testing. 


Preference Testing 


With this paradigm, a distinctive 
environment is associated with escape 
from shock. This environment is 
then matched with some other stim- 
ulus context and the subjects’ prefer- 
ences are recorded. 

After training their animals to es- 
cape from shock by running to a non- 
shock escape chamber, Goodson and 
Brownstein (1955) found that the 
escape chamber was preferred to 
either the shock compartment or a 
neutral compartment. Again, how- 
ever, the test situation was so con- 
trived that the results can be inter- 
preted in terms of the elicitation of 
the previously learned escape re- 


sponse. During training the animals 
learned to run away from the shock 
box and into the escape box. Both of 
these aspects of running were rein- 
forced by shock termination. In test- 
ing, the animals were put into an 
alleyway between two closed doors. 
These doors were simultaneously 
raised so that the animal was faced 
with the same situation encountered 
in training: an open door into the 
escape chamber, with all its eliciting 
cues, was present. The response 
scored, running into the escape com- 
partment, was exactly the same re- 
sponse on which the animal had been 
trained. 

Montgomery and Galton (1956) 
eliminated the testing ambiguity 
found in the Goodson and Brown- 
stein experiment, but introduced an- 
other confusion. After placing a rat 
in a small plexiglass “trolley car,” 
in one apparatus compartment shock 
was turned on and remained on while 
the animal was pulled into a second 
compartment, where it terminated. 
After a number of such trials the 
animal was put into the two-com- 
partment situation for unrestricted 
running and time spent in the two 
compartments was recorded. Un- 
fortunately, the fact that the sub- 
jects preferred the side where shock 
terminated can be interpreted to 
mean that they were avoiding the 
fear-arousing side. This is a phenom- 
enon so well-established that there is 
no necessity for talking about second- 
ary reinforcement. 

A preference-testing experiment 
which would give clear-cut results 
would combine the acquisition pro- 
cedure used in the Montgomery and 
Galton experiment and the fest pro- 
cedure of the Goodson and Brown- 
stein experiment. Since in Mont- 
gomery’s procedure the animals are 
transported from one compartment 
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to another no particular running re- 
sponse is learned: hence, such a habit 
could not manifest itself during test- 
ing and be confounded with a demon- 
stration of secondary reinforcement. 
Goodson’s test procedure of pairing 
a neutral box with both the shock box 
and escape box in preference tests 
should show whether the animals are 
simply avoiding the shock compart- 
ment or have learned a preference for 
the escape side. Gleitman (1955) re- 
ports a study which is, in principle, 
the same as this “ideal” experiment. 
Rats were placed in a transparent 
cable car with a grid floor. Shock was 
turned on in one part of the experi- 
mental room, continued while the 
rats were transported via cable to an- 
other part of the room, then turned 
off. The subjects were divided into 
three groups for the testing of pref- 
erences and it was found that the 
termination place was preferred to 
the shock-onset point, there was no 
preference between a neutral place 
and termination point, and there was 
no preference between a neutral place 
and shock-onset locus. The ambigu- 
ity in these results is the failure to 
show a clear approach or avoidance 
tendency in the groups having the 
neutral place as a choice. Two pro- 
cedural aspects of the experiment 
which might weaken any interpreta- 
tion placed on it are the facts that 
(a) because the animals were run in 
an “open” room there may have been 
present what Mowrer (1959) has 
called “ambiguous cues,” stimuli 
associated with both the onset and 
termination of shock; and (b) there 
were 11 experimenters, students in an 
experimental laboratory. As we shall 
see later, however, even a perfectly 
controlled experiment with this de- 
sign might fail to give positive re- 
sults, for it may not be possible to 
establish secondary reinforcement 


without the animals making a dis- 
criminative response during training. 


Summary 


The experimental literature shows 
a variety of tests of the hypothesis 
that the termination of aversive 
stimulation can be used as the pri- 
mary reinforcer in establishing a 
secondary reinforcer, but there are 
few positive results. In those in- 
stances where there has been a clear 
experimental effect the interpreta- 
tion is generally confounded such 
that the concept of secondary rein- 
forcement need not be invoked. Only 
one experiment (Crowder) seems to 
be unambiguously positive, with a 
highly significant effect, but in view 
of his own negative results in other 
experiments, as well as the rest of the 
literature, this does not provide an 
undue amount of faith that the 
phenomenon exists. We must now 
ask whether this predominantly nega- 
tive evidence is sufficiently strong to 
refute the theoretical positions which 
predict that secondary reinforcement 
should be established under such con- 
ditions or whether the explanations 
for the experimental failures are to be 
found in the experiments themselves. 
Toward this end a consideration of 
two variables related to secondary 
reinforcement becomes appropriate: 
namely, discrimination training and 
motivation. 


SECONDARY REINFORCEMENT 
AND DISCRIMINATION 
TRAINING 


Discriminative Stimulus Hypothesis of 
Secondary Reinforcement 


One of the difficulties involved in 
trying to interpret the experimental 
literature in the previous section was 
the lack of differentiation between 
the eliciting and reinforcing functions 
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of stimuli. There, we considered only 
the logical problem that in many 
situations the repeated occurrence of 
a response could be attributed to 
its elicitation by some previously- 
learned cue and that the concept of 
secondary reinforcement was there- 
fore superfluous. The black goal box 
visible from the choice point of a T 
maze is a case in point. The question 
to be considered now is somewhat 
different, to wit: what is the empirical 
relationship between the eliciting 
and reinforcing functions of stimuli? 
Specifically, can a stimulus serve as a 
secondary reinforcer without first 
having cue function? 

Keller and Schoenfeld (1950, p. 
236) have stated this discriminative 
stimulus hypothesis in firm terms: 
“In order to act as an S* for any re- 
sponse a stimulus must have status as 
an S” for some response.” This cue 
function is established through a 
process of differential reinforcement, 
reinforcing a response in the presence 
of a stimulus and not in its absence. 
The importance of such a procedure 
in discrimination training has been 
shown by Ferster (1951), for example, 
who found that a stimulus continu- 
ously present during both reinforce- 
ment and nonreinforcement did not 
have the properties of a discrimina- 
tive stimulus, i.e., had no control over 
behavior. In order to clarify the na- 
ture of the establishment of a dis- 
criminative stimulus, its use as a 
secondary reinforcer, and the nature 
of the problem of the interaction of 
these two properties, a brief review 
of the typical Skinner-box training 
procedure is in order before examin- 
ing experimental results. 

The procedure is generally as fol- 
lows: (a) An animal is trained to eat 
from a food magazine. (b) It is 
trained to eat in the presence of a cer- 
tain stimulus, such as a light, or im- 


mediately following some stimulus, 
such as the click of the food delivery 
mechanism. Such stimuli are referred 
to as discriminative stimuli (symbo- 
lized by SP) and are the same as the 
positive stimuli in any discrimination 
training situation. When the dis- 
criminative stimulus is not present, 
nor has just occurred (depending on 
the kind of training situation), the 
animal is not reinforced for going to 
the magazine. (c) A bar is introduced 
for the animal to press. As soon as it 
is pressed, S? follows immediately 
and the animal can go to the maga- 
zine and eat. The animal is then ex- 
tinguished on bar pressing, «with or 
without the SP following the response, 
and the occurrence of SP is found to 
increase resistance to extinction. As 
an alternative procedure the animal 
may originally learn to press the bar 
with only the SP as reinforcement. 
Under either of these conditions the 
discriminative stimulus is referred to 
as a secondary reinforcer. 

Most of the experimental tests of 
the discriminative stimulus hypothe- 
sis have been positive. Schoenfeld, 
Antonities, and Bersh (1950) found 
that the mere temporal contiguity of 
a stimulus with some reinforcer was 
not sufficient to establish this as a 
secondary reinforcer (compare with 
Ferster above, also). In their study, 
a light was associated with the con- 
sumption of food pellets, but not with 
obtaining them. After 100 such asso- 
ciations, the light did not increase the 
rate of bar pressing above operant 
level. 

Dinsmoor (1950) studied the dis- 
criminative stimulus—secondary re- 
inforcement relationship with an ex- 
tinction procedure. After training 
rats on bar pressing, half were ex- 
tinguished with the presentation of 
SP as reinforcement following re- 
sponses. For the other half, the bar 
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was removed and the experimenter 
presented the SP without food the 
same number of times that it had 
occurred in the first group. When the 
bar was again made available to the 
second group it was found to have ex- 
tinguished on bar pressing as much as 
the first group, i.e., response rate was 
reduced the same amount. Extinc- 
tion was carried further, with “cue” 
and “reinforcing” functions of the SP 
reversed for the two groups and no 
differential rate of extinction was 
found now either, a clear demonstra- 
tion of the intimacy of these two prop- 
erties of stimuli. Coate (1956) repli- 
cated part of Dinsmoor’s experiment 
with the same results. 

Notterman (1951) studied second- 
ary reinforcement as a function of 
amount of discrimination training. 
Interspersing a varying number of 
nonreinforced trials in which SP was 
absent among a constant number of 
reinforced trials in which SP was pres- 
ent, this investigator found that the 
more strongly the discrimination was 
thus developed the greater secondary 
reinforcing power the SP had. Mc- 
Guigan and Crockett (1957, 1958), 
Webb and Nolan (1953), and Wike 
and McNamara (1956) report similar 
results. 

These experiments indicate, then, 
that (a) with better discrimination 
training secondary reinforcement is 
stronger, and (b) when either the cue 
or reinforcing function of a stimulus 
is extinguished, the alternate func- 
tion also declines. The rationale for 
maintaining the distinction then is 
the way in which the stimulus is used, 
its temporal relationship to a particu- 
lar bit of behavior. Keller and 
Schoenfeld have clearly made this 
point, also. Contrary to these posi- 
tive results, however, there are three 
reports of experiments which appar- 
ently contradict the hypothesis. 
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Rozeboom (1957) was unable to 
replicate the Dinsmoor-Coate re- 
sults, and even got little depression of 
bar pressing after pairing the SP with 
shock during the phase when the re- 
sponse to the SP was being extin- 
guished. He does not report any data 
from the extinction period, unfortu- 
nately. In both Rozeboom's experi- 
ment and Wyckoff's (reported below) 
a somewhat unusual procedure was 
used. Rather than food reinforce- 
ment, water reinforcement was used, 
delivered by a dipper which was 
normally up and lowered into a reser- 
voir for water at the appropriate 
time. Rozeboom’s latent extinction 
procedure involved having the dipper 
mechanism operate without water 
during the cue extinction period. The 
subjects could still make the licking 
response to the dry dipper during the 
latent extinction period. 

Wyckoff, Sidowski, and Chambliss 
(1958) trained their rats to approach 
and lick the dry dipper when a buzzer 
sounded, whereupon the dipper de- 
livered water. After this training, a 
bar was inserted into the side of the 
box opposite the dipper and animals 
for whom the buzzer was contiguous 
with bar presses failed to make more 
responses than animals for whom the 
buzzer sounded automatically follow- 
ing 10 seconds of no bar pressing, the 
dipper no longer operating in either 
case. While the authors themselves 
felt “no inclination to reject the con- 
cept of secondary reinforcement,” 
they did believe that some crucial 
condition in the establishment of 
secondary reinforcement remains un- 
specified. In this experiment, the 
buzzer was not directly associated 
with water presentation and a con- 
summatory response (licking water 


from a dipper), but instead was 


paired with an operant response 
(licking dry dipper), which was simi- 
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lar to the consummatory response. 
Perhaps the crucial difference sought 
for by Wyckoff is related to this fact 
—a suggestion made promising by 
Rozeboom's negative results with the 
dry dipper technique. 

On the basis of the Wyckoff study, 
Myers (1958) suggests that many bar 
pressing experiments thought to have 
shown secondary reinforcement may 
have really shown only heightened 
activity following the presentation of 
SP. When this was controlled, ‘‘sec- 
ondary reinforcement” no longer ap- 
peared. Direct experimental evidence 
in support of this contention (which 
Myers did not present) can be found. 
Both Walker (1942) and Estes (1948) 
found that when an SP established 
under Condition X was periodically 
presented under Condition Y it in- 
creased the rate of a response with 
which it had never before been di- 
rectly associated. Gilbert and Sturdi- 
vant (1958) report similar results. It 
would not be surprising then to find 
that an SP would facilitate a response 
in the same situation where it had 
been associated with that response. 
Zimmerman (1957, 1959), however, 
contrary to Wyckoff, has obtained 
literally thousands of responses from 
his animals with nothing but second- 
ary reinforcement. In the same kind 
of control group as used by Wyckoff 
responding was very low, no more 
than operant level. It would seem 
valuable to repeat the Wyckoff ex- 
periment to ascertain just what vari- 
ables are operating. Wyckoff himself 
seems not to be dismayed by it all, for 
he has since attempted to develop a 
quantitative theory of secondary 
reinforcement (1959) using “cue 
strength” as the main variable in- 
fluencing the secondary reinforcing 
capacity of a stimulus. In any event, 
these results would not seem to in- 
fluence the interpretation of experi- 


ments using such apparatuses as 
straight runway or T maze, where 
trials are spaced. 

The third contradictory report is 
Ratner’s (1956). After training his 
rats to approach a hopper at the 
sound of a click he introduced a bar 
into the box. Animals for whom the 
bar pressing was followed by the 
click made more presses than a no- 
click group, but did not go to the 
hopper more often. Ratner suggests 
that although the click was reinfore- 
ing it was not an SP for goal-ap- 
proaching because the animals went 
to the goal box only a small propor- 
tion of the time that the click was 
presented (about 20% on the first 
day). Often, however, either rats or 
pigeons will not go to the goal after 
every response on a schedule of 100% 
primary reinforcement. In fact, one 
of the things we expect a reinforcer to 
do is to “strengthen a habit” so that 
the habit will maintain itself without 
external reinforcement. In addition, 
in Ratner’s situation other cues, such 
as the sight or sound of the food being 
delivered, may also have been impor- 
tant as SP in control of goal-ap- 
proaching and these may have been 
absent during testing. This briefly 
reported experiment is suggestive, 
but inconclusive. 

In view of the total evidence, it 
seems that something about the na- 
ture of discrimination training is im- 
portant for the establishment of 
secondary reinforcement. While we 
cannot here go into the problem of 
the possible underlying mechanisms 
we are inclined to take the evidence 
at its face value. Since it may be pos- 
sible to obtain secondary reinforce- 
ment without prior discrimination 
training and an SP does not seem to 
be guaranteed as a reinforcer we can 
not accept cue function as prima 
facie evidence for secondary re- 
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inforcement. It seems clear, how- 
ever that such training generally 
does make secondary reinforcement 
stronger and to this extent the SP 
hypothesis may be considered as 
“correct,” providing us with some 
empirical basis for analyzing the 
failure of some of the shock-termina- 
tion experiments. 


Discrimination Training and the 
Shock-Termination Problem 


By and large, the shock-termina- 
tion experiments have been based on 
a Hullian-type assumption that the 
sufficient condition for establishing 
secondary reinforcement is pairing a 
neutral stimulus with some “‘pri- 
mary” or other “secondary” rein- 
forcer (see McGuigan, 1956). There 
is no specific statement of prior dis- 
crimination training in this assump- 
tion (although Hull has written of the 
eliciting properties of the secondary 
reinforcer), and in none of the experi- 
ments thus far reported, save those of 
Smith and Buchanan and of Beck, 
has there been any attempt to use dis- 
crimination training. Therefore, it 
seems that the majority of negative 
and questionable results cannot 
necessarily be considered as evidence 
that the phenomenon does not exist, 
or as clearly opposing drive-reduc- 
tion theory. Rather, they can be con- 
sidered as evidence against only a 
particular statement as to how sec- 
ondary reinforcement can be estab- 
lished in conjunction with drive-re- 
duction reinforcement, i.e., that the 
simple pairing of a neutral stimulus 
and drive reduction is sufficient. If 
this statement is incomplete, and the 
evidence indicates that it is, then the 
experiments have hardly touched 
upon the main problem we are con- 
sidering—whether secondary rein- 
forcement can be established at all 
using shock termination as primary 


reinforcement. Granting that the 
procedures used with food or water 
may not necessarily be correct for 
use with the termination of noxious 
drives, they still provide the best di- 
rection for research. 


SECONDARY REINFORCEMENT 
AND MOTIVATION 


Recalling Mowrer and Aiken’s sug- 
gestion that perhaps they failed to 
obtain secondary reinforcement be- 
cause their animals were not appro- 
priately motivated during testing, 
we look back over the other experi- 
ments and see that only in Crowder’s 
and Beck’s experiments were the 
animals shocked during testing. On 
the other hand, in experiments with 
hunger and thirst the subjects are 
tested for secondary reinforcement 
under the same deprivation condi- 
tions with which they were trained. 
The exception to this occurs, of 
course, in the few experiments which 
have studied secondary reinforce- 
ment as a function of motivation. 

Brown (1956) found in tests for 
secondary reinforcement that there 
was no interaction between hunger 
level and amount of responding in re- 
inforcement and nonreinforcement 
groups. The secondary reinforcement 
group responded equally more than 
the control group at both “high” and 
“low” drive levels. She suggests that 
satiated animals getting the second- 
ary reinforcer might very likely have 
not given any indication of effective- 
ness of the reinforcing stimulus, but 
her low-drive group was not satiated. 

Miles (1956) obtained similar re- 
sults. After training his rats on bar 
pressing under 23 hours’ food dep- 
rivation he gave them extinction 
testing under 0, 2.5, 5, 10, 20, and 40 
hours of deprivation. At each drive 
level the experimental group was 
superior to a comparably trained and 
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deprived control group which did not 
get the secondary reinforcer. Like 
Brown, he found no regular trend for 
the difference between experimental 
and €ontrol groups to increase as a 
function of deprivation time, al- 
though the three shortest deprivation 
groups showed less difference than 
the three highest. The experimental- 
control differences were not signifi- 
cant at the shorter deprivation in- 
tervals, but the overall functions 
were. 

Oakes (1956), on the other hand, 
did find an interaction. Varying S? 
presentation and food deprivation 
time in a factorial design, he found 
that both of these variables influence 
straight runway performance. His 
high-drive group with cue reinforce- 
ment ran faster than either the low- 
drive group with cue or the high-drive 
group without cue reinforcement. 

Wike and Casey (1954) claim to 
have demonstrated the secondary 
reinforcing property of food for 
satiated animals, finding that the 
satiated animals which got food 
(which they did not eat) in the goal 
box ran a straight runway faster than 
rats which did not get pellets in the 
goal box. Unfortunately, as these 
writers themselves report, there was 
no control for the effect of simply 
manipulating something in the goal 
box, for example, nonedible objects. 

Schlosberg and Pratt (1956) got 
very marked results indicating the 
difficulty of demonstrating secondary 
reinforcement with satiated subjects. 
Rats under 23 hours’ deprivation 
showed a consistent preference for the 
side of a T maze where they could see 
and smell food, but not eat it. When 
run while satiated the pteference re- 
duced to chance, only to return im- 
mediately to its former high level 
when the rats were again deprived. 
Rats initially run in the maze while 


satiated showed only chance prefer- 
ences, and when switched to depriva- 
tion took as long to display the 
preference as did the first group, run 
deprived from the start. The authors 
conclude that hunger was necessary 
for both learning and maintaining the 
preference. 

In a recent study Grice and Dyal 
gave three groups of rats 110 click- 
food pairings per animal while the 
subjects were under 23 hours’ food 
deprivation. After this training a 
bar was introduced into the appara- 
tus and the animals were tested for 
30 minutes. There was a very signif- 
icant reinforcement effect between 
a 23-hour deprived-click-reinforce- 
ment group and a 23-hour no-click 
group, the means being 56.12 and 
17.12. However, a_satiated-click 
group gave almost twice as many re- 
sponses as the hungry-no-click group, 
a mean of 30.88. This suggests that 
while secondary reinforcement may 
be obtainable with satiated subjects, 
it is certainly more powerful with de- 
prived. 

Seward and Levy (1953) obtained 
results difficult to interpret. Rats 
given food reinforcement on one side 
of a T maze continued to show a 
preference for this side when run 
satiated. On the other hand, with 
repeated training and testing they 
showed no increasing preference for 
the food side as would be expected if 
secondary reinforcement were operat- 
ing. 

These various experiments on the 
relationship of drive level and sec- 
ondary reinforcement, while few in 
number and sometimes contradic- 
tory, strongly suggest that any such 
effect obtained with satiated animals 
would be weak. Only Grice and 
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Dyal obtained a difference even close 
to significance with satiated animals, 
excepting the doubtful results of 
Seaward and Levy and of Wike and 
Casey. If we now assume that mo- 
tivation is an important variable and 
look at the problem of the shock- 
termination experiments again we 
first have to determine the condition 
equivalent to “‘satiation.” 


Satiation in Shock Experiments 


As the term is generally used, 
“satiation” refers to the absence of 
some motivating condition, of the 
“complete gratification” of a need. 
In practice, this means that we have 
induced an animal to eat or drink as 
much as it will so that we are rea- 
sonably assured that it does not 
“need” food or water. In the shock 
situation satiation should then refer 
to the absence of shock. There is 
another component of shock situa- 
tions, however, namely, fear. Thus, 
even though shock were not present 
while testing in the same apparatus 
used in training there might be con- 
siderable motivation and we would 
not think of the subjects as satiated. 
Whether this fear component would 
be a powerful enough motivator to 
demonstrate secondary reinforcement 
clearly, assuming its demonstrability, 
would presumably be dependent 
upon such parameters as strength 
and number of shocks during train- 
ing. Since Miller (1951) has shown 
that animals will learn a variety of 
responses with escape-from-fear rein- 
forcement, we might expect that this 
same motivation would be potent 
enough to demonstrate secondary re- 
inforcement. Schoenfeld (1950) be- 
lieves that it can be so demonstrated, 
having hypothesized that in avoid- 
ance learning the animals continue 
to make the avoidant response be- 
cause the proprioceptive stimuli as- 


sociated therewith have taken on 
secondary reinforcing properties. 

We might get satiation on the 
other hand, if we tested the subjects 
in a completely different appabatus, 
as, for example, in the Littman and 
Wade experiment. Mason,’ how- 
ever, has made the suggestion that in 
this case the stimulus which was 
supposed to be reinforcing might 
have the opposite effect and arouse 
fear, inhibiting responses on account 
of its prior association with the 
shock situation. This is an especially 
interesting argument, for its im- 
plication to the drive reductionist is 
that a signal associated with shock 
termination will arouse drive in a 
nonshock situation and reduce it 
when the organism is pained or fear- 
ful. One would not then expect to 
get positive results in a situation 
like that of Littman and Wade, but 
would expect results like those of Lee 
and of Mowrer and Aiken. 


Effects of Very Strong Shock 
During Testing 


Thus far we have been considering 
the effects of very low motivation or 
complete absence of motivation dur- 
ing tests for secondary reinforce- 
ment. What about the converse, can 
there be too much motivation during 
testing? None of the hunger or 
thirst experiments concerned with 
secondary reinforcement and drive 
have apparently had the subjects too 
highly motivated, but shock can 
easily produce an excitation level far 
beyond that of appetitive drives and 
has been shown to have ill-effects on 
performance (for example, as far back 
as Yerkes & Dodson, 1908). How 
does this deleterious effect of very 
strong motivation apply in the shock 
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situation we have been considering. 

Competing responses. As men- 
tioned previously, shock may arouse 
responses in competition with or 
facilitating the response being meas- 
ured. The combination of these op- 
posite effects in a single experiment 
would tend to wash out experimental 
differences. 

Psychophysics of drive-reduction re- 
inforcement. A second problem may 
be even more serious since it strikes 
at the nature of the mechanism of 
secondary reinforcement. Let us as- 
sume momentarily that the basis of 
secondary reinforcement is some 
form of anticipatory drive reduction. 
Campbell and Kraeling (1953) have 
shown that the effectiveness of shock 
reduction as a reinforcer is a func- 
tion of the proportion of the total 
shock reduced, not just the absolute 
amount of reduction. This approxi- 
mates a Weber fraction, which Camp- 
bell (1955) has shown even more 
clearly with sound reduction. If, 
then, an animal is being shocked dur- 
ing a test for secondary reinforce- 
ment, the amount of anticipatory 
drive reduction induced by a sec- 
ondary reinforcer could well be less 
than the differential threshold for re- 
inforcement. What might potentially 
be a “good’’ secondary reinforcer 
could reduce such a small propor- 
tion of the total drive in this situa- 
tion as to be ineffective. We could 
therefore tell if the reinforcement 
“threshold” were reached only if re- 


- inforcement were demonstrated. If 


it were not demonstrated one could 
argue that the threshold had not 
been reached and the hypothesis was 
not disproved at all. The way to 
break out of this circularity would 
seem to be a careful study of a variety 
of shock and/or fear levels. The a 
priori selection of a shock level for 
testing would not seem to be ade- 


quate, even though the test shock 
were the same as in training, because 
the termination of a given shock in- 
tensity might be an effective rein- 
forcer for discrimination training 
but the shock inappropriate for con- 
tinual use during tests for secondary 
reinforcement. 
Motivation During Training 

The role of motivation during dis- 
crimination training is not con- 
sidered in detail because one can 
observe and measure discrimination 
performance with enough accuracy 
to tell when a discriminative response 
has been well-learned. There is also 
good evidence that shock intensity is 
relatively unimportant if the dis- 
crimination to be learned is simple 
(for example, Hammes, 1956). 

Summarizing these various lines of 
evidence, then, it may be suggested 
that some amount of aversive mo- 
tivation will probably be necessary to 
demonstrate secondary reinforce- 
ment derived through association 
with pain reduction. The precise 
level is a matter to be determined 
empirically, but intensities either too 
high or too low may negate other- 
wise positive results. 


SOME SUGGESTED EXPERIMENTAL 
APPROACHES 


Only Crowder, to the writer’s 
knowledge, has really attempted the 
“obvious” experiment, using the de- 
sign that Bugelski used over 20 years 
ago—extinguishing a response with 
and without the use of an hypothe- 
sized secondary reinforcer. Crowder's 
results suggest that this might bea 
fruitful approach, especially since he 
apparently did get positive results 
even without discrimination train- 
ing. A powerful procedural advance 
would be an extensive use of partial 
reinforcement, particularly the meth- 
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od that Zimmerman (1957, 1959) has 
reported for use with food and water 
reinforcement. With this technique, 
the SP? follows bar pressing only part 
of the time, and reinforcement fol- 
lows the SP only a fraction of the 
time that it occurs. Using grid shock 
in conjunction with a discrimination- 
training procedure and bar press 
training, one might produce satisfac- 
tory secondary reinforcement during 
extinction with the cue stimulus as 
the reinforcer. 

Fear motivation might be even 
more effective than direct shock, for 
many of the problems encountered 
with grid shock could be avoided. 
Suppose that we put a rat into a 
shock compartment with a hinged 
door in one of the walls. We block 
off this door and shock the animal 
severely, à la Miller. Now we turn 
off the shock and make the door 
available, gradually training the ani- 
mal in repeated trials to push open 
the door and run out of the shock 
compartment to a safe compartment. 
We run 10 such trials a day, shock- 
ing the animal before training each 
day to insure that the level of fear 
is high. We now introduce an SP, 
such as a buzzer. The rat is put in 
the box and the panel is locked, not 
to be opened until the buzzer sounds. 
Soon the animal learns not to push 
the door until the signal is presented. 
Then, à la Zimmerman, we put this 
panel pushing on a partial schedule, 
such that when the buzzer sounds 
the animal does not always get to 
escape when he pushes on cue. 
Rather, the buzzer ceases when the 
animal responds, then comes on 
again after another minute or so. 
We slowly build up this ratio so that 
the buzzer sounds several times be- 
fore the animal finally is reinforced. 
Now comes the test period. Every- 
thing is the same on this day except 


that there is a bar in the box, which 
when pressed sounds the buzzer for 
a period of time, but the animal is 
not allowed to escape. We can thus 
test the reinforcing capacity of the 
buzzer in the same manner that we 
test for secondary reinforcement with 
appetitive motivation. In the same 
analogous manner we could have in- 
troduced the bar during training it- 
self, then tested for secondary rein- 
forcement by presenting the buzzer 
(but no escape) during extinction. 
The critical aspects of either of 
these designs, in view of the foregoing 
discussions, are that (a) the buzzer 
is established as a discriminative 
stimulus, contiguous with escape 
from a fear-arousing situation, and 
(b) the motivational conditions dur- 
ing testing are the same as those dur- 
ing training. 

There are a variety of aversive 
stimulus situations which could be 
used as alternatives to shock and 
fear. In a report concerning the use 
of cold stimulation and heat rein- 
forcement, for example, Carlton and 
Marks (1957) report that it was very 
difficult to establish a stable bar 
pressing rate unless a cue stimulus 
preceded the onset of the heat. They 
interpret this to mean that the cue is 
serving as a secondary reinforcer. 
While this may not be a stringent 
test of reinforcement, the technique 
does seem readily amenable to more 
direct tests. One might use the 
cessation of strong light or sound as 
a reinforcer in the same way. Air 
deprivation and reinforcement would 
provide an interesting test, but an 
extinction procedure with the sub- 
jects deprived would rapidly become 
confounded. 


CONCLUSIONS 


This review of the experimental 
literature leads us to conclude that 


a a a 
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there is almost no evidence to show 
that secondary reinforcement can be 
established by the association of a 
neutral stimulus with noxious-drive 
reduction. An analysis of the experi- 
ments suggests that they have not 
been completely adequate for a 
variety of reasons, depending upon 
the particular experimental designs 
used. Generally speaking, there have 
been three major problems. First, 
there has often been a lack of cer- 
tainty whether stimuli were eliciting 
previously-learned responses, or re- 
inforcing them during tests for re- 
inforcement. Second, in almost none 
of the experiments has the secondary- 
reinforcer-to-be first been established 
as a cue, although the literature 
strongly suggests that this procedure 
is advisable. Third, there has been 
relatively little consideration of the 
role of motivation during tests for 
secondary reinforcement. 

It would seem that a first step in 
attacking this problem is to design 
experiments which provide maximal 


opportunity for the phenomenon to 
be demonstrated. It should be de- 
termined whether secondary rein- 
forcement can be established at all, 
using methodologically sound and 
unambiguous procedures, before go- 
ing on to test different hypotheses 
about the establishment of secondary 
reinforcement. Until this is done it 
seems meaningless to use the negative 
results thus far obtained as evidence 
against a concept as general as drive 
reduction. In the event that the 
phenomenon can never be demon- 
strated we may have a finding detri- 
mental to drive-reduction theory, 
but certainly not to reinforcement 
theory since there are a number of 
alternative explanations for the op- 
eration of reinforcement. Reinforce- 
ment theory may be forced into the 
acceptance of some kind of hedonic 
axiom, however, and agree with P. T. 
Young that getting rid of something 
bad is not the same as getting some- 
thing good. 
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The repeated measurements anal- 
ysis of variance designs have been 
popular in psychological research for 
a number of years. The advantage of 
these designs has been mainly that 
of- economy, relative to number of 
subjects (Ss), but increased precision 
may result; the experimental error is 
reduced when variance due to Ss is 
removed. The simplest design of this 
nature is that of the treatments X 
subjects design in which # Ss receive 
all of k treatments. More complex 
designs involve the latin square and 
modified latin squares. 

Whenever repeated measurements 
designs have been used the procedure 
has usually been to counterbalance 
the order of appearance of the treat- 
ments so as to avoid any practice or 
learning effect which may be present. 
In the simple case this involves hav- 
ing all Ss take the treatments in one 
order and then reversing the order on 
subsequent trials (intrasubject coun- 
terbalancing), or some Ss take the 
treatments in one order and other Ss 
receive different orders of presenta- 
tion (intersubject counterbalancing). 
A combination of these two proce- 
dures probably is used most frequent- 
ly. However, the practice effect is 
not partitioned in these analyses. In 
the more complex designs, the in- 
vestigator is able to separate a source 
of variation due to practice or order. 
In some cases the effect due to the 
order by treatment interaction can 
also be partitioned. 

The repeated measurements de- 
signs have been considered by nu- 


1 Now at Lake Forest College. 
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merous individuals (e.g., Alexander, 
1947; Gaito, 1958a, 1958b; Garrett 
& Zubin, 1943; Grant, 1944, 1948; 
Gourlay, 1955; Hilgard, 1951; Ko- 
gan, 1948, 1953; Lindquist, 1947, 
1953; Lubin, 1954, 1957, 1958; Mc- 
Nemar, 1951, 1955; Peters, 1944). 
Likewise, the criticisms directed at 
these designs have been numerous. 
The major indicated defect in thie 
simple design is that the treatment 
effect will be confounded with any 
practice effect or, if counterbalancing 
is used, the main effects will be bal- 
anced but any practice effect will 
appear in the interaction effect, thus 
producing a negative F test bias 
(Type II error). The use of latin 
squares has been criticized because 
it has been maintained that interac- 
tions must be zero for valid use of 
this design. However, papers by 
Gourlay (1955) and Gaito (1958b) 
have indicated that this assumption 
is not always required. The latter 
individual employed the expected 
value of mean square [E(M.S)] con- 
cept and showed that the important 
consideration as to the suitability of 
the latin square model depends on 
the number of random variates in- 
cluded in the experiment. Work by 
mathematical statisticians (Wilk & 
Kempthorne, 1957) has also indi- 
cated that interactions do not neces- 
sarily have to be zero. 

The overall problem of repeated 
measurements designs is a complex 
one, and a satisfactory treatment has 
not been effected. However, the 
E(MS)Econcept (Anderson & Ban- 
croft, 1952; Cornfield & Tukey, 


ly 
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1956; Greenwood, 1956; Kemp- 
thorne, 1952; Wilk & Kempthorne, 
1957) provides a suitable technique 
for a clear investigation of this prob- 
lem. The purpose of this paper is to 
extend this approach to a number of 
repeated measurements designs. To 
investigate adequately this problem 
we shall treat six cases which are 
most frequently used: (a) all Ss re- 
ceive the Treatments (T) in the same 
Order (0); (b) the Order of Treat- 
ments is randomized for each S; (c) 
the Order is balanced (assumption 
that all interactions containing order 
are zero); (d) the Order is balanced 
and analyzed as a single latin square 
(no assumptions about interaction) 
(e) the Order is balanced without 
interaction assumptions but ana- 
lyzed by a modified latin square de- 
sign (e.g., Lindquist Type II design); 
and (f) the Order is balanced without 
assumptions but analyzed as a simple 
Treatments X Subjects design. 


REPEATED MEASUREMENTS 
DESIGNS 
Case I. Same Order 


This represents the simplest type of 


“repeated measurements design. All 


n Ss receive the k Treatments in the 
same Order. Table 1 indicates the 
E(MS). The rule for obtaining the 
E(MS) in a complete factorial design 
is as follows: E( MS) is o2 (variance 
due to error) plus the o? term whose 
subscript corresponds to the main 
or interaction effect of concern. It 
also includes all ø? terms which rep- 
resent interactions with this main or 
interaction effect, providing the ef- 
fects not included in the main or 
interaction effect are all random. For 
example, in a two-variable design 
(A XB) in which A is a random ef- 
fect, the E(MS) for B would be o2 
+o?+ou; for A, o2+o2 (see An- 
derson & Bancroft, 1952; Cornfield & 


TABLE 1 


COMPONENTS OF VARIANCE INCLUDED IN 
MEAN SQUARES FOR Case I: 
ORDER EFFECT PRESENT 


T atso to tso? 
S otto? 
TS oton? 


Note.—In Tables 1-6 all main effects are fixed except 
S, which is random, 


Tukey, 1956; Greenwood, 1956; 
Kempthorne, 1952; Wilk & Kemp- 
thorne, 1957). The coefficient for 
gê is 1; all other g? terms have coeffi- 
cients which are equal to the number 
of replications (mz) times the number 
of the levels of the variables which 
are not included in the subscript of 
the o? under consideration? How- 
ever, because of confounding aspects 
other components will be included in 
some mean squares. These can be 
determined intuitively. 


2 The most general treatment of coefficients 
for a complete factorial design is by Cornfield 
and Tukey (1956). They use 1—x/X as 
coefficients for o where x refers to the num- — 
ber of replications and X is the total popula- 
tion. 1—x/X is also used as a coefficient for 
x in o? due to interactions if x is not inyolved 
in the mean square in question. These coeffi- 
cients serve to suppress terms completely 
when they are fixed effects (if x=X, then 
coefficient is zero). If x is very small and X is 
infinite or very large, then the coefficient goes 
to 1. These coefficients also reduce the o* 
terms (are between .zero and 1) when the 
populations are finite but larger than the 
samples. We shall not be concerned with this 
latter situation. Thus in Table 1 the coeffi- 
cient for ø throughout the table and for 
ou? in Tis 1—s/S, Inasmuch as the sample of 
Ss is small whereas the population S is 
usually very large, the coefficient becomes 1. 
The coefficient for ou? in Sis 1—1/T; however, 
t=T because we have all Treatment levels in 
our experiment. Thus the coefficient is zero 
and en? vanishes. The coefficients for ow? in 
TS is 1 because 1—t/T does not appear for 
either the T or S portions. Both are involved 
in the TS mean square. The coefficient for 
ow in T is 1—s/S, which becomes 1. All co- 
efficients are multiplied by n, which in our 
example is 1. 5 Sah 
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In Table 1 Subjects represents a 
random variate and thus o;,? appears 
in the Treatments effect. No Order 
XTreatment effect, or any inter- 
action containing Order, is present 
because only one order is involved.* 
Furthermore, g and ø are con- 
founded and cannot be separated. 
The coefficient for ø, is the same as 
for ø. Of course, if ¢.2=0 then the 
test of T by TS is a valid one. If o 
is not zero then positive bias occurs 
in the F test of T, a tendency for too 
many significant effects being re- 
ported (Type I error). 


Case II. Randomization of Order 


Because we suspect the presence of 
an Order effect (and interactions in- 
volving this effect) we decide to ran- 
domize the Order of presentation of 
the Treatments to each individual 
separately. The result of this pro- 
cedure is indicated by Table 2. In 
this case g has been removed from 
T. Any effects of Order or any inter- 
action will appear in øe. and, thus, be 
felt in all effects, so that the F test of 
T will be a valid one. However, o 
will be inflated if the Order or inter- 
action effects are present. 


Case III. Balanced Design—Inter- 
actions Zero 


This situation represents the 
limited example which has been con- 
sidered in detail by Gaito (1958a). 
Hilgard (1951) and Lindquist (1953), 
as well as others, have been concerned 
with this case. Because of possible 
practice effects we have one or more 
Ss take one Order, one or more other 


3 Even though we speak of Order effects and 
Order interactions, it is actually trial effects 
and trial interactions which are involved, be- 
cause differences between trials, or differential 
effects of trials for different Ss, indicate that 
the different Orders do not give the same re- 
sults. However, it is usual to speak in terms 
of the former. 


TABLE 2 


COMPONENTS OF VARIANCE INCLUDED IN 
MEAN SQUARES FOR Case II: 
RANDOMIZATION OF ORDER 


Ta oè tso ton 
S otto? 
TS oè tou? 


Ss take another Order, etc., but as- 
sume that all Order interactions are 
zero. Table 3 indicates that øe now 
appears in TS, thus making for nega- 
tive bias in the F test of T. The 
balancing procedure equalizes the 
various levels of each main effect but 
not the interaction components. If 
more than one factor is included in 
the experiment, the levels of all main 
effects and interactions not involving 
the random effect are equalized, but 
the interaction levels including the 
random effect are not. Furthermore, 
the magnitude of o,? inflation tends to 
increase with increasing order of in- 
teraction (e.g., mean square of TıTaS 
is greater than TiS or TaS). 


Case IV. Single Latin Square—Inter- 
actions Present 


The single latin square design has 
been used infrequently in recent years 
in psychology, possibly because of the 
criticisms of Lindquist (1953) and of 
McNemar (1951) concerning the fre- 
quent presence of interactions. This 
design (in which each S has a differ- 
ent Order) has been considered by 
Gaito (1958b) as the One Random 


TABLE 3 


COMPONENTS OF VARIANCE INCLUDED IN 
MEAN SQUARES FOR Cass III: 
BALANCING OF ORDER BUT NO 
ORDER INTERACTIONS PRESENT 


T oê tso tHon? 
S oet to: 
TS oê Hon? +So0" 
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TABLE 4 


COMPONENTS OF VARIANCE INCLUDED IN MEAN SQUARES FOR Case IV: 
SINGLE LATIN SQUARE ANALYSIS WITH No ASSUMPTION CONCERNING INTERACTIONS 


T oe ble? tont tout (1 —2/ion 
0 Cole d ton ton + (1 —2/Dern® 
S at -ttot tout —1/hon 
Residual otto tou to +2) 
Note.— SEN o =i ~s, the coefficients for all main effects are given as 4. 


Variate Model. That article also 
deals with the Zero, Two, and Three 
Random Variates Models as well. 
The rule for the complete factorial de- 
sign presented above must be modi- 
fied to deal with this incomplete 
factorial design. The above rule is 
applied first. Then the following 
additions are included. Residual con- 
tains all interactions, and each main 
effect is confounded with the triple 
interaction and the double inter- 
action containing the other two 
effects. The paper by Wilk and 
Kempthorne (1957) presents a gen- 
eralized derivation for latin square 
designs and the coefficients for each 
g? term in Table 4 are based on that 
derivation. ø and all interaction g? 
terms except gu have a coefficient of 
1. The coefficient of o? for each main 
effect is t. The coefficient for G.u? in 
the Residual and the two fixed effects 
(T and O) is 1—2/t; in S the coeffi- 
cient gets closer to one (1—1/) inas- 
much as the random effect is in- 
volved. 

In this case the F test of T is nega- 
tively biased unless ø+? is zero. Even 
though the F test is unbiased when 
T is zero it is not a valid F test be- 
cause it is not distributed as the F 
distribution. A valid F test requires 
that the interactions in the mean 
squares of both the main effect and 
the Residual must be random, nor- 
mally distributed, and be a compo- 
nent that would be expected in the 
mean square as indicated by the rule 


above. If these conditions are not 
satisfied the result is a ratio of two 
noncentral chi square statistics di- 
vided by their respective degrees of 
freedom, and the distribution de- 
pends upon the parameters of un- 
wanted components, in the present 
situation g, and Gst. For a valid 
and unbiased F test, G’, Cus, and 
7,1? Must be zero. 


Case V. Lindquist Type II Design— 
Interactions Present 


This situation is the same as in 
Cases III and IV except that groups 
of Ss take each Order; also we allow 
all Order interactions to be present 
and analyze the results as a Lindquist 
Type II design (Table 5). This de- 
sign is actually a modification of the 
single latin square design and the 
arguments presented above for the 
One Random Variate Model are per- 
tinent here. The Residual contains 
gè and all possible interactions except 
S, which has been removed. Each 
main and interaction effect contains 
c, variance due to itself, and the 
interaction of the effect with other 
effects which are random. Further- 
more, because of the confounding 
aspects of the latin square each main 
effect includes variance due to the 
other two factors and variance due 
to the triple interaction. In this de- 
sign if only two Treatments and two 
Orders are involved the TXO(w) 
effect disappears (Lindquist, 1953). 

The F test of T and TXO(b) will 
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TABLE 5 


COMPONENTS OF VARIANCE INCLUDED IN MEAN SQUARES FOR CASE V: 
BALANCING OF ORDER, ALL INTERACTIONS PRESENT, AND ANALYZED 
As LINDQUIST Tyre II DESIGN 


Between Ss 
Groups or TXO(b) 
Between Ss within Groups 


Within Ss 
Treatments 
Order 
TXO(w) 
Residual 


oe tto? +sta (b) + (1 —1/t) ost 
of +toe+(1—1/t) este 


otso tHon Hos t (1—2/t)0st 
oê + toe? +0: +o? +1 —2/t)ost0” 
oê t+ s'o (w) + (1 —2/t) oto 
oê tHon? tHo + (1 —2/t) ote? 


Note.—Because o =/, t is used as the coefficient for both ,* and oo, s’ refers to the number of Ss in each group, 


s to the number of Ss in the experiment. 


not be biased even if all interactions 
are present. However, the F test of 
TXO(w) will be negatively biased. 
The unbiased tests will not be dis- 
tributed as F because of nuisance 
parameters, e.g., in Treatments, the 
Cu and Cao Thus the Type II 
“mixed” design, which is one of a 
number of designs which Lindquist 
recommends for counterbalancing 
purposes (1953, p. 163, Ch. 13) ap- 
„ pears to give unbiased (but nonvalid) 
results for the main effects, when all 
interactions are present. 

The advantage of the Type II de- 
sign is that it allows for a separation 
of both the Order and the Treatments 
Order effects. However, if the 
latter is present the test of the 
Treatments effect may not be mean- 
ingful, even though unbiased. If 
Order were a random effect, then the 
test of the Treatments effect is mean- 
ingful. However, usually Order rep- 
resents a fixed effect. Thus if the 
interaction is of a ‘‘reversal’’ type 
(i.e., one Treatment is most effective 
with one Order of presentation 
whereas other Treatments are more 
effective with different Orders), an F 
test of T would be meaningless. How- 
ever, in a “continuous spread” type 
of interaction (i.e., the rank order of 
the Treatments are the same for all 


Orders but the difference between 
Treatments varies with the Order of 
presentation), a generalization based 
on the F test of T would be meaning- 
ful. 

The Type II design represents one 
of a large number of “mixed” de- 
signs. Readers interested in the 
E( MS) for more of these should con- 
sult Harter and Lum (1955). 


Case VI. Balanced Treatments X Sub- 
jects Design—Interactions Present 


Let us take the same procedure as 
in Case V but analyze the results as a 
simple Treatments X Subjects design. 
This result is indicated in Table 6. 
The E(MS) for T is the same as in 
Table 5; S contains all the between- 
subjects variance terms of that table; 
and TS contains the O, TXO(w), and 
Residual components. Note that 
ø is contained in TS as was indi- 
cated for Case III. The reader should 
note also that TS is the same in 
Tables 3 and 6, except that in the 
latter table are included Order inter- 
action g? terms while in Table 3 these 
are missing. For the E(MS) of Table 
3 it was assumed that Order inter- 
actions were not present. 

As is obvious from Table 6, the F 
test of T will be negatively biased be- 
cause two unwanted components, 


REPEATED MEASUREMENTS DESIGNS 51 


TABLE 6 
COMPONENTS OF VARIANCE INCLUDED IN MEAN Squares For Case VI: 


BALANCING OF ORDER AND ALL ORDER INTERACTIONS PRESENT 


+ o oè +soP tou? tast (1 —2/t)oste? 
Ss aè ttot +s tot (b) +(1 —1/t)oete* 
T 


g and Ti w), will be included in the 
denominator. The defects occurring 
in this situation are more severe than 
in the above cases. 


DISCUSSION 


From the six cases presented above 
it is obvious that the possible defects 
which may occur in repeated mea- 
surements designs are extreme. It 
would appear that if one does have a 
repeated measurements design, the 
safest procedure would be to random- 
ize the order of treatments so that 
order and all interactions containing 
order would be included in o2 and 
appear in all effects, unless he has 
strong reasons for believing that cer- 
tain interactions are not present. 
However, one might use a Lindquist 
Type II design. In the former design 
unbiased and valid F tests of the 
treatment effect are obtained. In the 
latter design unbiased tests of the 
treatment effect are obtained but 
these tests are not distributed as the 
F distribution and will not be mean- 
ingful if a “reversal” type interaction 
between order and treatments has 
occurred. 

All of these counterbalancing ex- 
amples have been of an intersubject 
nature. However, the results of 
intrasubject counterbalancing would 
be similar. For example, if intrasub- 
ject counterbalancing were used such 
that each S would receive two or 
more sequential orders (e.g., if two 
treatments, the Ss would take only 
the ABBA or BAAB orders), the o 
would be confounded with ¢? in the 


TS oF lot +ou +o tsau (w) + (1 —2/t) oat 


treatments effect. If inter- and intra- 
subject counterbalancing were to be 
employed, some Ss would receive two 
or more orders of presentation while 
other Ss would receive some reversal 
of these orders (e.g., if two treat- 
ments, some Ss would have the 
ABBA sequence while others would 
have the BAAB sequence). In this 
case if a subjects treatments anal- 
ysis is followed and a practice effect 
is present which is constant from 
trial to trial for all Ss, no bias occurs 
in either the main effects or the inter- 
action; however, the within-cells term 
will be inflated. If the practice effect 
is not constant from trial to trial, and 
is either the same or not for all Ss, 
then inflation will occur in both the 
interaction and within-cells terms. 
The above considerations should 
make one cautious concerning the 
use of a repeated measurements de- 
sign. However, only the effects of 
order and interactions have been dis- 
cussed. There is another source of 
contamination in the repeated mea- 
surements designs, i.e., correlated ob- 
servations. It has been assumed by 
many investigators that by partition- 
ing a source of variation attribu- 
table to Ss, the problem of correla- 
tion has been handled. That this as- 
sumption is not true has been 
indicated by a number of people (e.g., 
Box, 1954; Danford & Hughes, 1957; 
Geisser & Greenhouse, 1958; Lubin, 
1954, 1957, 1958; Scheffe, 1956). 
Box (1954) indicates that when 
there is moderate correlation within 
rows (in psychological experiments 
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the row variable would represent Ss), 
a great distortion occurs in the prob- 
ability levels for between-rows com- 
parisons but little distortion is intro- 
duced for between-columns compari- 
sons. The maximum correlation that 
Box studied was +.40. In the case of 
the negative correlation the percent 
probability for the test of columns 
(Treatments) was 5.90 rather than 
5.00 (which would result when corre- 
lation is zero); for positive correla- 
tion the percent probability was 
6.68. Box makes use of an approxi- 
mate technique in which the degrees 
of freedom are reduced by multiply- 
ing each df by a fraction, epsilon (e), 
which depends on the correlation 
within-rows. The upper limit of eis 1, 
which will occur only if the variances 
are equal and the correlation is con- 
stant among the Treatments. In this 
case the F ratio with the usual df can 
be used. In the event that just two 
treatments are involved, e equals 1 if 
the variances are equal. However, in 
many designs using three or more 
treatments, e will be less than 1; thus 
if the usual df are employed (without 
reduction by e) an increase in Type I 
errors will occur. 

Geisser and Greenhouse (1958) 
have extended Box’s result to de- 
velop a conservative F test of treat- 
ments. They show that e=>(k—1)— 
and thereby determine the lower 
limit for the df to be 1/n—1, where k 
refers to the number of treatments 
and is the number of Ss. (This re- 
sult can be obtained by multiplying 
the df for treatments (k—1) and for 
treatments Xsubjects [(k—1)(n—1)] 
each by 1/k—1.) Thus the F test 
with df of 1 and n—1 can be em- 
ployed when unequal covariation 
occurs with one group of Ss. They 
also develop a conservative test when 
more than one group is involved. In 
this case the df for the approximate 


F test of treatments is 1/N—g, where 
N is the total number of Ss and g is 
the number of groups. However, the 
authors maintain that the use of the 
lower limit may be too conservative. 

Danford and Hughes (1957) argue 
for the use of the usual analysis of 
variance design, maintaining that the 
equal covariance assumption (con- 
stant correlation) is tenable for cer- 
tain experimental situations. They 
state that some experimental data 
have shown comparable correlation 
coefficients (r’s of .70 to .90).4 They 
criticize Scheffe’s (1956) suggestion 
to use Hotelling’s T? statistic for 
testing the fixed main effect because 
of the above. Likewise, they indicate 
that if the equal covariance assump- 
tion is correct the power of the usual 
F test is greater (in some cases, much 
greater) than is the power of Hotel- 
ling’s test. 

Lubin (1954, 1957, 1958) has co- 
gently considered the repeated meas- 
urements designs, not only con- 
sidering the effects of correlated ob- 
servations but also treatment <order 
interactions, and other learning or 
“carry over” effects. Because of these 
contaminating effects, he recom- 
mends the use of a modification of 
Hotelling’s T? test, or a nonpara- 
metric rank-order test if one is inter- 
ested in the relative efficacy of sev- 
eral treatments (unless a treatment 
Xorder interaction is present). If 
this interaction is present he advo- 
cates a matched Ss design in which 
each S receives only one treatment. 

Thus the F test is theoretically 
correct only if constant correlation 


among treatments is present. If only. 


two treatments are involved, and 
homogeneity of variance is present, 
then it follows that the F test is 
always appropriate. If unequal corre- 


4 The experimental data of concern are not 
cited, however. 


—— ee 
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lation occurs, too many significant 
Fs will be reported. With moderate, 
but unequal, correlation among treat- 
ments, the increase in number of sig- 
nificant results reported for treat- 
ments effects appears to be small, 
using Box's approximation. The in- 
crease with greater correlation is un- 
known. However, the F test indi- 
cated by Geisser and Greenhouse 
allows one to make a conservative 
test. 

In conclusion, it is apparent that 
multiple defects are present in re- 
peated measurements designs. The 
design using randomization of the 
order of treatments avoids the num- 
erous defects but o may be quite 
large. With randomization the corre- 
lation effect should be minimized. 
The Lindquist “mixed” design over- 
comes some of the defects but the F 
test of treatments, even though un- 
biased may not be a valid F test and 
may be meaningless. The matched 
Ss design recommended by Lubin 
would appear to be the safest pro- 
cedure if enough Ss are available. 
However, the important point to 
stress is that if an investigator resorts 


to a repeated measurements design he 
should be aware of possible distor- 
tions which may occur and be able 
to defend his assumptions concern- 
ing the order effect, the order inter- 
actions, and the correlated observa- 
tions. 


SUMMARY 


Six types of analysis of repeated 
measurements designs are indicated. 
The effects of order, interactions con- 
taining order, and correlated obser- 
vations on the components of vari- 
ance and analysis of variance tests of 
significance are considered. The first 
two act, in general, to inflate the error 
estimates and thus to increase the 
probability of a Type II error. The 
correlated observations (if unequal) 
have the opposite effect, i.e., increase 
the probability of a Type I error. It 
is suggested that caution be exercised 
in the use of these designs; randomi- 
zation of the order of treatments or 
matched subjects appear to be the 
safest procedures. The Lindquist 
Type II “mixed” design overcomes 
some defects but is not completely 
appropriate. 


REFERENCES 


ALEXANDER, H. W. The estimation of relia- 
bility when several trials are available. 
Psychometrika, 1947, 12, 79-99, 

ANDERSON, R. L., & BANCROFT, T. A. Statis- 
tical theory in research. New York: Mc- 
Graw-Hill, 1952. 

Box, G. E. P. Some theorems on quadratic 
forms applied in the study of analysis of 
variance problems: II. Effects of inequality 
of variance and correlation between errors 
in the two way classification. Ann. math. 
Statist., 1954, 25, 484-498, 

CorNFIELD, J., & Tukey, J. W. Average 
values of mean squares in factorials. Ann. 
math. Statist., 1956, 27, 907-949. + 

Danrorp, M. B., & Hucues, H. H. Mixed 
model analysis of variance, assuming equal 
variances and equal covariances. USAF 
Sch. Aviat. Med. Rep., 1957, No. 57-144. 

Gaito, J. Statistical dangers involved in 


counterbalancing. Psychol. Rep., 1958, 4, 
463-468. (a) ae 
GAITO, J. The single latin square design in 
psychological research. Psychometrika, 

1958, 23, 369-378. (b) r 

GARRETT, H. E., & ZuBin, J. The analysis of 
variance in psychological research. Psy- 
chol. Bull., 1943, 40, 233-267. 

GEISSER, S., & GREENHOUSE, S. W. An ex- 
tension of Box’s results on the use of the F 
distribution in multivariate analysis. Ann. 
math. Statist., 1958, 29, 885-891. 

Gourtay, N. F-test bias for experimental de- 
signs of the latin square type. Psycho- 
metrika, 1955, 20, 273-287. 

Grant, D. A. On “The analysis of variance 
in psychological research.” Psychol. Bull., 
1944, 41, 158-166. eet 

Grant, D. A. The latin square principle in 
the design and analysis of psychological ex- 


54 JOHN GAITO 


periments. Psychol. Bull., 1948, 45, 427- 
442. 

GREENWOOD, J. A. Analysis of variance and 
components of variance: Factorial experi- 
ments. Unpublished paper, USN Bureau 
of Aeronautics, 1956. 

Harter, H. L., & Lum, M. D. Partially 
hierarchal models in the analysis of vari- 
ance. USAF WADC Rep., 1955, No. 55-33. 

HiıLcarD, E. R. Methods and procedures in 
the study of learning. In S. S. Stevens 
(Ed.), Handbook of experimental psy- 
chology. New York: Wiley, 1951. Pp. 517- 
567. 

KEMPTHORNE, O. The design and analysis of 
experiments. New York: Wiley, 1952. 

Kocan, L. S. Analysis of variance: Repeated 


measurements. Psychol. Bull., 1948, 45, 
131-143. 
Kocan, L. S. Variance designs in psy- 


chological research. Psychol. Bull., 1953, 
50, 1-40. 

Linpquist, E. F. Goodness of fit of trend 
curves and significance of trend differences. 
Psychometrika, 1947, 12, 65-78. 

Lrnpguist, E. F. Design and analysis of ex- 
periments in psychology and education. New 


York: Houghton Mifflin, 1953. 

Lusrn, A. Are non-parametric tests distribu- 
tion-free? Unpublished paper, Walter Reed 
Army Institute of Research, 1954. 

Lusin, A. Some rank-order tests for trend in 
a set of correlated means. Unpublished 
paper, Walter Reed Army Institute of Re- 
search, 1957. 

Lusin, A. On the repeated measurements de- 
sign. Unpublished paper, Walter Reed 
Army Institute of Research, 1958. 

McNemar, Q. On the use of latin squares in 
psychology. Psychol. Bull., 1951, 48, 398- 
401. 

McNemar, Q. Psychological statistics. New 
York: Wiley, 1955. 

Peters, C. C. Interaction in analysis of 
variance interpreted as intercorrelation. 
Psychol. Bull., 1944, 41, 287-299. 

ScuerFe, H. A mixed model for the analysis 
of variance. Ann. math. Statist., 1956, 27, 
23-36. 

Wik, M. B., & KEMPTHORNE, O. Non-addi- 
tivities in a latin square design. J. Amer. 
Statist. Ass., 1957, 52, 218-236. 


(Received August 6, 1959) 


Psychological Bulletin 
1961, Vol. 58, No. 1, 55-79 


HUMAN TRACKING BEHAVIOR! 


JACK A. ADAMS 
University of Ilinois 


The subject matter of this paper is 
a critical review and analysis of re- 
search, issues, and points of views 
associated with human behavior in 
one- and two-dimensional tracking 
tasks. Tracking tasks have never 
been given explicit definition and one 
of the purposes of this paper is to ten- 
tatively advance the general bounds 
of a tracking situation, but for those 
who are unfamiliar, a temporary 
working definition for the moment 
which enjoys the consensus of most 
psychologists is as follows: 


1. A paced (i.e., time function) externally 
programed input or command signal defines a 
motor response for the operator, which he per- 
forms by manipulating a control mechanism. 

2. The control mechanism generates an out- 

put signal, 
` _ 3. The input signal minus the output signal 
is the tracking error quantity and the opera- 
tor’s requirement is to null this error. The 
mode of presenting the error to the operator 
depends upon the particular configuration of 
the tracking task but, whatever the mode, the 
fundamental requirement of error nulling al- 
Ways prevails. The measure of operator pro- 
ficiency ordinarily is some function of the 
time-based error quantity. 


The usual tracking task has a vis- 
ual display but there is no necessity 
for this. On occasion, auditory track- 
ing tasks have been devised (Forbes, 
1946; Humphrey & Thompson, 1952a, 
1952b, 1953). The most simple and 


U 1 This research was supported by the 
ard States Air Force under Contract No. 
oe 49(638)-371, monitored by the Air Force 
ce of Scientific Research of the Air Re- 
search and Development Command. 
oe number of psychologists read this manu- 
ee in draft form and contributed to its 
a provement. The detailed and critical com- 
any of F. C. Bartlett, E. A. Fleishman, 
- B. Gibbs, N. B. Gordon, J. A. Leonard, and 
- T. Welford were particularly appreciated. 


well-known visual tracking task is 
the Rotary Pursuit Test (Melton, 
1947) which employs a repetitive in- 
put signal and, although investiga- 
tions using the Rotary Pursuit Test 
are not ordinarily included in that 
body of research which is considered 
to study tracking behavior per se, it 
is nevertheless an unequivocal ex- 
ample of the breed. Tracking studies 
typically use more elaborate appa- 
ratus which allows for controlled 
manipulation of such variables as 
the function for the input signal, 
scale factors, mathematical trans- 
formations of the output signal, 
characteristics of the control mecha- 
nism, etc. 

While investigations of tracking 
behavior might legitimately be sub- 
sumed under the time-honored rubric 
“motor skills,” this label is misleading 
in hinting by implication and text- 
book tradition that motor behavior, 
such as tracking, is disassociated 
from so-called “higher processes.” 
British investigators in particular 
have analyzed the acquired ability to 
predict input stimulus sequences as a 
key intervening response class in de- 
termining the proficiency of the meas- 
ured motor responses in tracking 
tasks, thus emphasizing the interlac- 
ing of “higher” and “lower” processes. 
These British studies will be discussed 
in detail later, but passing mention of 
them at the onset seems worthwhile 
for establishing the archaic connota- 
tions of “motor skills.” Research by 
Adams (1957), Fleishman (1954, 1957a, 
1957b, 1958), and Fleishman and 
Hempel (1954, 1955, 1956) on vari- 
ables influencing individual differ- 
ences in motor behavior, also docu- 
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ments the inherent complexity of the 
response totality elicited in motor 
tasks. Helson (1949), in discussing 
variables influencing the subject’s 
standard of excellence in a tracking 
task, includes perceptual and moti- 
vational states in addition to motor 
factors as influential determiners of 
motor behavior. 


Basic TERMINOLOGY AND FRAME 
OF REFERENCE 


Independent variables influencing 
tracking behavior will be divided into 
two classes: task variables and pro- 
cedural variables. Task variables are 
machine-centered. They are the 
physical values of the tracking de- 
vice, and they include such factors as 
the nature of the input signal, con- 
figuration of the display, design of 
the control system, mathematical 
transformations relating control dis- 
placement and changes in the output 
signal, etc. Procedural variables are 
man-centered. They are manipulable 
nontask quantities, and examples of 
them are instructions, number of 
practice trials, length of the practice 
trial, and time between trials. Also, 
the indicants which are displayed to 
the operator will be implicitly as- 
sumed as simple elements, such as 
needles or dials, pointers, dots on 
cathode ray tubes, etc. Special prob- 
lems that arise when the display is 
perceptually complex and requires 
the interpretation of forms, shapes, 
colors, etc. will be ignored. 


THE TRADITION OF ENGINEERING 
PSYCHOLOGY 


A dominant influence in tracking 
research is the experiments of engi- 
neering psychology, with the em- 
phasis being largely on the relations 
between measures of tracking be- 
havior and task variables. The engi- 
neering psychologist has as his goal 
the prediction of the characteristics 
of man-machine systems, and this 


ADAMS 


goal requires careful attention to the 
task variables which influence the 
operator. Representative examples 
ofseveralhundred task-oriented track- 
ing experiments are studies of control 
loadings (Bahrick, 1957; Bahrick, Ben- 
nett, & Fitts, 1955; Bahrick, Fitts, & 
Schneider, 1955; Briggs, Bahrick, & 
Fitts, 1957; Howland & Noble, 1953; 
Weiss, 1954), input signal character- 
istics (Hartman, 1957; Hartman & 
Fitts, 1955; Noble, Fitts, & Warren, 
1955), the magnitude of lag between 
control movement and system out- 
put (Conklin, 1957; Warrick, 1949), 
the effects of visual noise (Briggs & 
Fitts, 1956; Briggs, Fitts, & Bahrick, 
1957), mathematical transformations 
of the output signal (Briggs, Fitts, & 
Bahrick, 1958), and compensatory 
vs. pursuit tracking (Chernikoff & 
Taylor, 1957; Poulton, 1952b). Task 
variables, because of their role in de- 
termining the behavioral require- 
ments for the operator, are an im- 
portant class of variables for psy- 
chology and engineering psycholo- 
gists have made a notable contribu- 
tion in directing attention toward 
neglected determiners of human be- 
havior. However, this strong task 
orientation has led to the neglect of 
procedural variables that influence 
the operator, and thus the efficiency 
of the total man-machine system. A 
recent article (Taylor, 1957) has 
clearly stated this emphasis: 

... human engineering aims first at building 
better systems and secondarily at improving 
the lot of the operator. Thus, whereas con- 
ventional psychology, both basic and applied, 


is anthropocentric, human engineering is 
mechanocentric (p. 252). 


This statement succinctly summa- 
rized the task-oriented approach of 
engineering psychology and expresses 
a downgrading of procedural vari- 
ables related to training, retention, 
fatigue, motivation, etc. It is for- 


gotten, or intentionally neglected, | 


that the engineering psychologist 
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Fic. 1. The analogy commonly drawn between a closed-loop electromechanical servo- 
system and a human operator as an error-nulling agent in a tracking task. 


must, over the long run, develop the 
capability to predict the effectiveness 
of a man-machine system for different 
States of the operator, and this means 
a strict scientific accounting of a 
broad range of variables which in- 
fluence man. There are few who un- 
derestimate the importance of task 
variables in determining the behavior 
of a man-machine system, but there 
seems to be no sound justification for 
relegating procedural variables to a 
secondary status. In the beginning, 
an applied branch of a science might 
Profitably concern ‘itself with rank 
ordering its variables in terms of their 
potency in influencing a criterion 
(Taylor & Garvey, 1959), but this 
roach does not deserve being 
S raed to a research philosophy. 
Ophisticated applied science, just as 
oe basic science, must work 
eu a precise accounting of all 
es and their interrelations. 

hereas general experimental psy- 
er has often looked to tradi- 
ee ehavioral theory as a basis for 
i a racking studies, many engineer- 
a P iebolosini with their mecha- 
a Ntric views, have turned towards 
e feedback theory of closed-loop 


servomechanisms (Bower & Schul- 
theiss, 1958; Brown & Campbell, 
1948; Goode & Machol, 1957) as a 
model for a man-machine tracking 
system. Basically, a closed-loop 
servosystem is an electromechanical 
error-nulling system which compares 
an input signal with an output signal 
and works toward reducing the differ- 
ence between them. Because error 
nulling is a basic characteristic of 
systems which include the human 
operator as a tracking component, 
some engineering psychologists view 
physical servotheory as a potential 
source of descriptive relationships 
for manual tracking systems. Figure 
1 shows the parallel that is ordinarily 
drawn between a servosystem and a 
man-machine tracking system. The 
theory ofservomechanismsisamethod 
of mathematical analysis concerned 
with the description of the output of 
a complex system as a function of the 
input, and it allows the system’s 
analyst to state the functional char- 
acteristics of his system with some 
precision. The expression of the in- 
put-output relations is by means of a 
complex ratio called the transfer 
function which expresses the nature 
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of the transformations that the sys- 
tem imposes on the input signal. Ina 
system comprised of a number of 
components, a transfer function is 
determined for each component and 
these can then be combined to yield 
an overall transfer function for the 
system. An important feature of 
these methods of system analysis is 
that is not necessary to painfully 
trace the signal through each element 
of acomponent to compute the input- 
output transformation represented 
in the transfer function. Rather, a 
“black box’’ approach can be taken 
where input-output relationships are 
directly compared without attending 
to the many intermediate transforma- 
tions which occur to the signal as it 
passes through the component. 

The servosystem analyst is con- 
cerned with input-output relations as 
they are manifested in two domains: 
time and frequency. In the time 
domain the time-varying character- 
istics of the system are described in 
terms of overshooting, undershooting, 
oscillations, steady state errors, etc., 
in response to a step input. In the 
frequency domain the output of the 
system is examined for transforma- 
tions of a sinusoidal input after the 
transients have died out. Finally, 
and perhaps most importantly in this 
brief exposition on the methods of 
servosystem analysis, is that the en- 
tire mathematical structure is found- 
ed on the assumption of linearity. 
Fundamentally, this assumption means 
that the system obeys the superposi- 
tion theorem which states that the 
system response to the sum of a set of 
inputs is equal to the sum of the re- 
sponses made to each input sepa- 
rately. This means that the perform- 
ance of the system can be predicted 
for any complex input providing we 
know the response of the system to 
each of the constituent inputs com- 
prising the complex input. Another 
implication for the linear assumption 
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is that it will accurately reproduce 
input sinusoidal frequencies after 
transients have died out, although 
there may be phase shift and ampli- 
tude change. Furthermore, it is 
implicit that the output of the system 
is solely a function of the input and 
this functional relationship is de- 
scribed by the transfer function—i.e., 
for example, it is not a function of 
such variables as time where the 
system might perform one class of 
transformations on the inputs at 
time ż and another class at a later 
time. 

Ellson’s paper (1949) best expresses 
the hope of some engineering p.y- 
chologists that the transfer function 
for the human operator might be 
determined and provide an analytical 
means of predicting the performance 
of the total man-machine system, 
and of optimizing the performance of 
the system by designing hardware 
components to complement the re- 
sponse characteristics of man. This 
goal of mathematically describing 
the characteristics of man and his 
machine elements is scientifically ad- 
mirable but, regrettably, it was 
doomed from the beginning by the 
massive barrier of the linearity as- 
sumption. Almost self-evident is the 
fact that the human operator is a non- 
linear component of a system with 
his intricate adaptive propensities 
toward learning, fatiguing, motiva- 
tional shifts, etc. and that there is 
faint possibility of finding the trans- 
fer function which can be used by 
system designers to optimize the per- 
formance of a system by capitalizing 
on the transformations that man im- 
poses on a signal as it enters the 
receptors, makes passage through the 
organism, and is emitted anew by the 
responding effector system (Birming- 
ham & Taylor, 1954; Ellson, 1949; 
Fitts, 1951; Searle & Taylor, 1948). 
Birmingham and Taylor (1954) have 
nicely expressed this matter of non- 
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linearity for the tracking human 
operator: 

This adaptability on the part of the man 
is, of course, a great boon to the control de- 
signer, since he can rely upon the human to 
make the most of any control system, no mat- 
ter how inadequate. It is this which probably 
constitutes the most important single reason 
for using men in control loops. Yet, this very 
adjustability renders any specific mathemati- 
cal expression describing human behavior in 
one particular control loop quite invalid for 
another man-machine arrangement. This sug- 
gests strongly that “the human transfer func- 
tion” is a scientific will-o’-the-wisp which can 
lure the control system designer into a fruitless 
and interminable quest (p. 1752). 

Fitts (1951) has reported on certain 
limited conditions where human re- 
sponse appears to approximate linear- 
ity but, in general, it would seem 
that the nonlinearities of human be- 
havior negate the usefulness of the 
servomodel and its mathematical 
techniques as a serious theoretical 
instrument for behavior theory or as 
a tool for the design of man-machine 
systems. Nonlinearities do, of course, 
occur in some physical systems but 
the assumption of linearity is met 
sufficiently well and often to make 
the theory of important value for the 
physical sciences. This could hardly 
be said for psychology where non- 
linearities are an inherent, and in- 
deed the most interesting and chal- 
lenging, aspect of the human operator. 
It must be concluded therefore, that 
present-day servotheory stands in an 
analogous, not a scientific, relation- 
ship to man-machine tracking sys- 
tems, 

Even if analytical methods even- 
tually become available to handle the 
nonlinearities of closed-loop human 
behavior, it is unlikely that engineer- 
ing psychology will be able to make 
effective use of them if it continues its 
Preoccupation with task variables 
(Taylor, 1957; Taylor & Garvey, 
1959) and underplays the role of 
Procedural variables which are basic 
determiners of dispositional states of 


the operator and contribute sub- 
stantially to the nonlinearities. Engi- 
neering texts on servotheory (Bower 
& Schultheiss, 1958; Brown & Camp- 
bell, 1948; Goode & Machol, 1957) 
distinguish between analysis or the 
description of a system of existence, 
and synthesis or the prediction of the 
characteristics of components of the 
system to achieve certain objectives, 
Conceivably, we might eventually 
describe a man-machine tracking 
system already in existence because 
the response characteristics of the 
human operator can be empirically 
determined for the range of inputs of 
interest and the operator states that 
prevail. However, synthesizing is 
quite different because it requires 
that we know the laws of human be- 
havior as a function of task and pro- 
cedural variables and are able to 
predict the characteristics of the hu- 
man response functions. Questions 
relating to such operator states as 
learning and fatigue most certainly 
will arise and it is evident that these 
queries will not be answerable if task 
variables are taken as the primary 
research domain of engineering psy- 
chology. Engineering psychology, it 
would seem, cannot escape the bur- 
den of the same variables and searches 
for lawfulness which traditionally 
occupy all psychologists. 

In defense of the servotheory ap- 
proach to tracking, its protagonists 
have been engaged in proper search 
for a descriptive mathematical device 
for man-machine tracking systems 
which includes provisions for task 
variables and the properties of re- 
sponse outputs to inputs which are 
continuous with respect to time. Con- 
temporary behavior theories ordi- 
narily employ measures of behavior, 
such as frequency and latency, which 
can be defended as operationally 
meaningful dependent variables but 
which are gross summary indices of 
complex behavior sequences and often 
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do violence to the subtleties of the 
ongoing behavior. Commonly, psy- 
chologists in their laboratory research 
will elicit elaborate time-based re- 
sponse sequences from an organism 
and then will ignore completely the 
time-varying characteristics of the 
responding in their measurement. In 
contrast, psychologists studying track- 
ing have recognized, almost from the 
beginning, that their dependent meas- 
ures should somehow describe the 
prominent characteristics of time- 
based response functions. And, be- 
cause contemporary behavior theories 
give no attention to time functions, 
tracking psychologists appear to have 
suffered disenchantment and have 
turned to the mathematical schema 
of closed-loop servotheory, inade- 
quate though it is, because it grap- 
ples directly with the measurement 
and description of time-varying quan- 
tities. The fact that servotheory is of 
little value for quantitative descrip- 
tion of man-machine tracking sys- 
tems should not allow us to forget 
that the interest in it has reflected a 
legitimate concern about measure- 
ment issues and variables which are 
important for the response phenom- 
ena under investigation. 


TRADITION OF GENERAL EXPERI- 
MENTAL PSYCHOLOGY 


Basic research on tracking by gen- 
eral experimental psychologists has 
not had the strong emphasis of task 
variables. Frequently, in basic re- 
search, the experimental task has 
been a convenient means of eliciting 
a response class for the purposes of 
manifesting underlying behavioral 
processes which are of theoretical 
rather than practical interest, and 
consequently tracking tasks have not 
been studied for their own sake. Ex- 
amples of this approach are many of 
the tracking studies on the Rotary 
Pursuit Test with interest in fatigue- 
like effects or, more exactly, the im- 
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plications of Hull’s (1943) expres- 
sions of reactive and conditioned 
inhibition for behavior (Adams, 1956; 
Adams & Reynolds, 1954; Kimble & 
Horenstein, 1948). Other studies of 
fatigue processes (Floyd & Welford, 
1953; Payne & Hauty, 1954; Siddall 
& Anderson, 1955) using tracking 
tasks have had a similar general con- 
cern and have shown little interest in 
the study of tracking for its own sake. 
The interest in task variables per se 
which has preoccupied engineei:ng 
psychology has been largely absen: in 
the research of general experimental 
psychology. This has been a healthy 
countertrend to the task emphasis of 
engineering psychology but the ap- 
proach of using virtually any con- 
venient task to elicit a response class 
can be considered a deficiency be- 
cause it shows a lack of appreciation 
for the influence of task variables on 
behavior, and the possible interac- 
tions that can be expected to occur 
between task and procedural vari- 
ables. These studies seem to have 
implicitly assumed that behavioral 
laws will transcend particular char- 
acteristics of a task, but this is an 
unlikely possibility because of the 
extensive work in engineering psy- 
chology showing the potent influence 
of task variables on performance. 
There is good reason to expect that 
many task variables will interact with 
those variables which have been of 
interest in testing theoretical deduc- 
tions. To illustrate, if it were even- 
tually found that a major cause for 
the depressant effects of massed 
practice on the tracking response was 
that work inhibition degraded the 
quality of proprioceptive feedback, 
the behavior functions would, as a 
minimum, have to be expressed in 
relation to the interaction of inter- 
trial interval and those control sys- 
tem variables which determine pro- 
prioceptive feedback. Helson (1949), 
in a report of the Foxboro investiga- 
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tions which were an early series of 
systematic tracking studies, points 
out that both task and procedural 
variables are pertinent to a complete 
understanding of human behavior. 
Lewis (1953) has urged closer atten- 
tion to the relations between the 
physical organization of tasks and the 
complexities of behavior. 

An important line of tracking re- 
search, which can be subsumed under 
the rubric of general experimental 
psychology, has been dominated by 
British investigators of the Applied 
Psychology Research Unit, Cam- 
bridge University, and mainly con- 
cerns efforts to delineate the intrinsic 
characteristics of the overt motor 
tracking response, and to identify 
and assess the response classes which 
intervene between the displayed stim- 
uli and the measured motor response. 
Examples of these interests are the 
question of whether the apparently 
smooth, continuous tracking response 
is fundamentally intermittent (Cher- 
nikoff & Taylor, 1952; Craik, 1947, 
1948; Davis, 1956; Elithorn & Law- 
Tence, 1955; Hick, 1948; Poulton, 
1950; Searle & Taylor, 1948; Taylor 
& Birmingham, 1948; Vince, 1948a, 
1948b, 1949; Welford, 1952) and the 
conditions under which the human 
Operator learns to predict or antici- 
pate changes in the input signal 
(Bartlett, 1951; Craik, 1947, 1948; 
Leonard, 1953; Poulton, 1952a, 1957a, 
1957b, 1957c; Vince, 1953, 1955). 
These studies have manipulated both 
task and procedural variables and 
have, in many respects, been the most 
influential of all in improving our 
Scientific understanding of tracking 
behavior because they have attempt- 
ed, in a detailed and analytical fash- 
lon, to clarify the various response 
facets of tracking behavior and the 
variables determining them. It is 
perhaps safe to say that these studies 
have stood as a numerical minority 
In tracking research, and this is un- 


fortunate because such information 
stands as the foundation of any sys- 
tematic empirical and theoretical 
organizations of tracking behavior. 
Neither the studies of tracking qua 
tracking which have arisen out of the 
applied interests of engineering psy- 
chology, nor the studies of theoretical 
psychology where tracking tasks have 
been used as a matter of convenience, 
can progress very far until their find- 
ings are related to the complex char- 
acteristics of tracking behavior. Ana- 
lytical tracking studies in this vein 
will be discussed in some detail in 
later sections of this paper. 


AREAS OF NEGLECT 


With some exceptions, engineering 
psychology and general experimental 
psychology have tended to gloss over 
three fundamental topics which must 
be given more attention if we are 
eventually to have the beginning of a 
theory of tracking behavior: 

1. Tracking tasks have never been 
defined other than by convention. 
Early interests in tracking behavior 
arose out of applied situations where 
a continuously generated error quan- 
tity had to be nulled by continuous 
operator movements. Laboratory 
studies of tracking follow this applied 
tradition of a continuous task, al- 
though on occasion discrete displace- 
ments of the input signal have been 
used (Craig, 1949; Ellson, Hill, & 
Craig, 1949; Rund, Birmingham, 
Tipton, & Garvey, 1957; Searle & 
Taylor, 1948; Taylor & Birmingham, 
1948; Vince, 1948b, 1949). An at- 
tempt must be made, at least in a 
preliminary way, to define the allow- 
able variations in input, both in type 
and functional form, as well as the 
characteristics of the control system 
used for responding. 

2. Not enough attention has been 
given to the emphasis (largely British) 
on a more detailed description of 
behavior in tracking. Recognition 
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must be given to the presence and 
interaction of several overt and inter- 
vening response and stimulus classes, 
and how these factors act to deter- 
mine the characteristics of the meas- 
ured motor response. 

3. Relatively little interest has 
been expressed in multidimensional 
tracking tasks having two or more 
stimulus sources in the same or differ- 
ent sense modalities, and correspond- 
ing dimensions in the control system 
for response to each source. Most 
tracking research has been performed 
on one-dimensional tasks. The im- 
plications of various ways of organiz- 
ing multiple inputs and the control 
systems for response to them need 
more formalization and research. 

This paper will, in turn, discuss 
issues, problems, and research asso- 
ciated with each of these three areas. 


Definition of Tracking 


A one-dimensional tracking task 
will be defined by the following con- 
ditions: 

1. An externally driven input sig- 
nal defines an index of desired per- 
formance and the operator actuates 
the control system to maintain align- 
ment of the output signal of the con- 
trol system with the input signal. 
The discrepancy between the two 
signals is the error and the operator 
responds to null the error. Two basic 
types of tracking tasks are differenti- 
ated by how this error quantity is 
represented: (a) Pursuit Tracking. 
The display has two indicants. One 
is actuated by the input signal and 
the other is linked to the output 
signal of the control system. The two 
indicants are presented directly to 
the operator and he responds to null 
the error difference between them. 
(b) Compensatory Tracking. The 
error to be nulled is not the difference 
between two directly observed indi- 
cants primarily linked to the input 
and output signal as in pursuit track- 
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ing. Instead, the error observed in 
pursuit tracking is abstracted and 
used to actuate a single indicant in 
relation to a fixed reference. The 
operator's task, just as in pursuit 
tracking, is to null this error. The 
principal difference between pursuit 
and compensatory tracking is that 
with the latter the operator never 
observes the uncontaminated action 
of the input or output signal directly 
—only the error difference between 
them. 

2. The input signal is time-based 
and independent of the operator's 
response, i.e., the task is paced. A 
paced task is distinguished from 
a self-paced task where stimulus 
changes are a function of operator 
responding (Adams, 1954). 

3. The control system has con- 
straints that enforce certain transi- 
tional courses of action on the human 
operator. Instead of being able to 
move the control from a given posi- 
tion to any other position, the opera- 
tor must move through defined inter- 
vening states of the control system. 
For example, consider a one-dimen- 
sional visual tracking task using a 
pivoted control lever with hypotheti- 
cal control Positions A, B, C, and D. 
If the operator is at Position B at 
time ż, he has a three-choice decision 
for moving the control at time ¢+1, 
each with a probability of being cor- 
rect: he can repeat the response of 
time ż and leave the control at Posi- 
tion B, or he can move the control to 
either Position A or C. At the two 
extreme limiting positions of the con- 
trol, only two choices are involved: 
leave the control where it is, or move 
it to the position adjoining the limit- 
ing one. By this definition, any task 
where the operator has free transi- 
tional access to all of the control sys 
tem states is prohibited from being 4 
tracking task. 

4. The states of the input signal 
have the same transitional constraints 
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as the control system. The input 
signal, in changing from time ¢ to 
t+1, must change according to con- 
straints defined by the control sys- 
tem. By imposing the same con- 
straints on the input signal and the 
control system, the tracking task is 
given a degree of feasibility for the 
human operator and means that the 
input cannot take any action which, 
in principle, cannot be met by action 
of the control system. This does not 
mean that a tracking task must allow 
near perfect performance by the 
operator. The input function may be 
a high frequency sine wave to which 
the operator can never achieve a high 
level of proficiency, but this is a be- 
havioral matter and not a function of 
inherent design features of the task. 
Table 1 presents the permissible 
transitional states for the hypotheti- 
cal four-state tracking task discussed 
above. 

This definition is general and does 
not specify the characteristics of the 
input signal or the control system, 
other than indicating certain transi- 
tional restraints for both. The input 
states and the responses to them can 
be discrete or continuous, and the 
input can have any degree of regu- 
larity from nearly random (true 
randomness is denied by conditional 
restraints of the type shown in Table 
1) to completely repetitive. The use 
of discrete states of the input signal 
deserves more than the passing at- 
tention it has been given in the past 
because they are particularly ame- 
nable to statistical structuring in 
terms of first and higher order prob- 
abilities (with the restraints noted). 
Another advantage of discrete inputs 
is that their duration is easily manip- 
ulable, making the number of events 
per unit of time an important dimen- 
Sion for investigation. This time 
variable has been termed the “speed 
or pacing factor” (Adams, 1954; 
Conrad, 1951, 1954; Wagner, Fitts, 


TABLE 1 


Matrix GOVERNING THE ALLOWED TRAN- 
SITIONAL STATES FOR THE INPUT SIG- 
NAL AND THE CONTROL SYSTEM 
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Note—The matrix represents a se hregeeiers four- 
state one-dimensional tracking task, Cells marked with 
“Yes” indicate permissible transitions from the ith 
state at time / to the jth state at time ¢-+1. “No” entries 
are absolute constraints and signify the denial of transi- 
tion toa jth state from a prior sth state. 


& Noble, 1954) and is analogous to 
number of cycles per second when a 
continuous input is used. One prom- 
ising measure expressing the statisti- 
cal coherency of a discrete input 
signal and the duration of its events 
is the informational measure of bits 
per unit of time (Shannon & Weaver, 
1949). The rate of change, as well as 
higher derivatives, can also be a 
variable for discrete input events but 
no attempts have ever been made to 
explore these more complex dimen- 
sions. 


The Complexity of Behavior in One- 
Dimensional Visual Tracking 


The purpose of this section is to 
discuss some of the characteristics 
of the response classes which can be 
identified in one-dimensional visual 
tracking, as well as the issues sur- 
rounding them. Visual tracking will 
be analyzed because almost all track- 
ing research has used the visual mo- 
dality. However, in whatever broad 
empirical and theoretical conceptual- 
izations of tracking behavior that 
might eventually mature, it will be 
necessary to structure the character- 
istics of tracking in other sense mo- 
dalities to. But since other modalities 
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such as audition have received only 
exploratory attention (Humphrey & 
Thompson, 1952a, 1952b, 1953), it 
seems unduly speculative at this 
time to include them. 

Rather than the servotheory ap- 
proach which has been the frame of 
reference of some investigators, an 
attempt will be made to demonstrate, 
on the basis of the available experi- 
mental evidence, that tracking be- 
havior involves a linked chain of 
overt and internal stimuli and re- 
sponses and is much more complex 
than implied by the prominent error- 
nulling characteristics of the servo- 
analogy. While the servoanalogy is 
adequate enough for its schematic 
purposes, the behavioral phenomena 
cannot be viewed so simply. There 
are three major areas for discussion: 
the observing response which orients 
receptors to sense stimulus events on 
the display, the prediction responses 
where the operator learns to antici- 
pate future characteristics of the in- 
put signal, and the hypothesis that 
the measured motor response, even 
in continuous tracking, is intermit- 
tent and not smooth graded move- 
ments that might appear to a casual 
observer. 

Most of the phenomena will be dis- 
cussed in greatest detail under the 
heading of pursuit tracking, and the 
presence of the same or similar phe- 
nomena in compensatory tracking 
will, in most cases, be obvious. Be- 
havioral considerations which are 
uniquely characteristic of compensa- 
tory tracking will be treated sepa- 
rately. 


Pursuit TRACKING 


Observing Response 

The sensing of the displayed indi- 
cants driven by the input and output 
signals, as well as the error difference 
between them, is by the observing 
response. These three environmental 
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quantities each play an important 
role in pursuit tracking and their 
moment-to-moment state is sampled 
as the observing response orients the 
receptors to them. The input indi- 
cant is the desired state, the error 
difference between the input and out- 
put signal represents how well the 
desired state is achieved, and the 
output indicant gives knowledge of 
results on how specific sequences of 
motor movements are represented on 
the display. Some general attention 
has been given to the general role of 
the observing response (Wyckoff, 
1952), but within the context of 
tracking it is considered as having 
two functions: head and/or eye 
movements to direct the visual re- 
ceptors to spatially separated stim- 
uli, and the discrimination of stim- 
ulus change. The head and/or eye 
movements can be considered overt 
aspects of the observing response and 
potentially measurable (Mackworth 
& Mackworth, 1958). However, the 
discrimination function of the ob- 
serving response is an inferred phe- 
nomena, with its locus unspecified. 

Common experience dictates the 
necessity for an observing response 
but there is also experimental evi- 
dence which documents its impor- 
tance. Adams (1955), using the 
Rotary Pursuit Test, found that 
operations of repeatedly activating 
the visual observing response inde- 
pendently of the arm-hand goal re- 
sponse, and which presumably served 
to fatigue the observing response, 
resulted in a goal response decrement 
and permitted the inference that the 
performance level of the goal re- 
sponse is partly determined by the 
strength of the intervening observing 
response. Another relevant line of 
evidence isastudy by Poulton (1952b) 
where it was found that pursuit 
tracking performance deteriorated 
when the two pointers on the display 
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were increased in their spatial sepa- 
ration. One interpretation of this 
finding is that the greater spatial 
separation required more extensive 
orienting of the observing response 

„ with the result that less time, on the 
average, was devoted to each pointer. 
Viewing the observing response as the 
mechanism by which stimuli are 
sampled, the wider the spatial sepa- 
ration the less frequently each source 
of environmental stimuli is sampled 
and the less likely that an appropriate 
response will be made. Bearing on 
this sampling function of the observ- 
ing response is a vigilance experiment 
by Jerison and Wallis (1957) where 
it was found that the scanning of 
three stimulus sources resulted in a 
lower rate of detecting aperiodic 
stimulus change than when only one 
source had to be watched. 


Prediction Responses 


The input signal in pursuit track- 
ing actuates an indicant which is 
directly observed by the operator. 
To the extent that the operator can 
predict the regularities inherent in 
this input signal he will be able to 
anticipate the correct response move- 
ment and initiate at a time to mini- 
mize error. In the absence of a pre- 
dictive capability the operator must 
wait for the change in the input signal 
to actually occur on the display, with 
the result that his response will gen- 
erate tracking error as a function of a 
delay of at least one reaction time 
interval. 

Helson (1949) and his associates, in 
their Foxboro studies of tracking 
during World War II, were perhaps 
the first to suggest that prediction 
behavior is manifest in reaction time 
values far less than those obtained in 
classical reaction time experiments. 
Bartlett (1951) has written an ex- 
cellent paper on the role of anticipa- 
tory behavior which seems to be little 


known and referenced in the United 
States. The most extensive research 
on the prediction of directional course 
changes in the input signal has been 
by Poulton (1952a, 1952b, 1957a, 
1957b, 1957c), and he distinguishes 
between two general classes of pre- 
diction: (a) receptor anticipation, 
which is analogous to the foreperiod 
of the classical simple reaction time 
experiment wherea preparatory signal 
is presented to the operator in ad- 
vance and establishes a “set” for re- 
sponse, and (b) perceptual anticipa- 
tion, where no advance information 
is intentionally given each time but 
the operator nevertheless is able to 
predict the course of future signals on 
the basis of his past experience. It is 
this latter type of anticipation which 
is of greatest interest in tracking in 
that any knowledge of a future state 
of the input signal must be an ac- 
quired or learned prediction; the 
definition of a tracking task does not 
provide for foreknowledge of a state 
of the input signal. In one study 
(1952b) Poulton evaluated anticipa- 
tion in pursuit tracking as a function 
of practice and two levels of input 
complexity—a simple harmonic mo- 
tion and a complex harmonic course. 
Taking an anticipation of change in 
the input signal as a response of dura- 
tion less than the expected reaction 
time of about .20 seconds, Poulton 
found that the subjects were predict- 
ing the simple harmonic course both 
early and late in practice, and that 
the success of prediction was a posi- 
tive function of practice. Although 
overall tracking error decreased with 
practice on the complex input course, 
there was no evidence for improve- 
ment in anticipation and Poulton 
concluded that the improvement was 
largely attributable to increased man- 
ual dexterity. In this study, Poulton 
also investigated the smoothness of 
tracking, defined by the number of 
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unnecessary discrete changes of speed 
that were made. The fewer the num- 
ber of such changes, the better the per- 
formance. With the simple harmonic 
course, it was found that smooth- 
ness of response increased with prac- 
tice but no such changes were found 
for the complex input. Poulton 
viewed his measure of smoothness as 
an additional index of anticipation 
because, when the operator was not 
anticipating, he would tend to wan- 
der off course and his tracking record 
would show a greater number of 
corrective movements. He observes 
that smoothness is a less sensitive 
measure of beneficial anticipation 
than response time because the op- 
erator may be tracking with a large lag 
but nevertheless tracking smoothly. 
Yet, the fact that the subjects tracked 
most smoothly for the harmonic in- 
put course which also produced the 
greatest degree of anticipation sug- 
gested that the covariation of these 
two measures reflects the same under- 
lying ability to predict stimulus 
change in direction. 

Another study by Poulton (1952a) 
used the same pursuit tracking ap- 
paratus as in his previous study 
(1952b) and investigated the accu- 
racy with which an operator could 
predict the position of the input indi- 
cant for various amounts of time in 
the future. At the sound of a hammer 
blow the operator had to move the 
output indicant to the position an- 
ticipated for the input indicant when 
a bell sounded .50, 1.5, or 3.5 seconds 
later. This procedure was regularly 
repeated and resulted in a series of 
discrete responses predicting the posi- 
tion of the input indicant. The accu- 
racy of prediction was better than 
chance for both simple harmonic and 
complex harmonic inputs, with the 
accuracy being greater for simple 
harmonic motion. On the basis of 
these experiments, Poulton concluded 


that course anticipation is an im- 
portant determiner of the overall 
proficiency level in pursuit tracking. 
He hypothesized that higher input 
speeds place a greater premium on 
prediction because, as the speed of 
the input signal increases, the failure 
to anticipate means that a greater 
segment of the input course span will 
pass during the subject’s reaction 
time period if he waits for stimulus 
change to actually occur before re- 
sponding and a larger error will de- 
velop. An excellent review of the role 
of prediction in tracking and otter 
types of visual-motor tasks has ben 
published by Poulton (1957b). 

A series of investigations by Gotts- 
danker (1952a, 1952b, 1955, 1956) is 
closely related to those of Poulton. 
Gottsdanker’s studies were concerned 
with the prediction of velocities and 
accelerations of input rather than 
directional changes in the course, and 
were subsumed under the label pre- 
diction motion. The experimental ap- 
proach required the subject to track a 
continuous input viewed through a 
narrow slit. The input was printed on 
paper in the form of parallel lines 5 
millimeters apart, and the subject 
responded by trying to keep a pencil 
point between the two lines. He was 
told that when the input disappeared 
he was to project its path into the 
future as if he were attempting to 
follow an airplane that had gone 
behind a cloud. Some of the input 
paths had constant velocities but 
others had motions that were posi- 
tively or negatively accelerated. In 
general, his findings show that con- 
stant velocities are accurately pre- 
dicted, but that the prediction of 
accelerations tended to be of a con- 
stant velocity rather than the re- 
quired increase or decrease in velocity. 
Gottsdanker interpreted this to mean 
that the subject responds on the 
basis of averages or integrations of 
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preceding velocities. Two studies by 
Vince (1953, 1955) used a technique 
very similar to Gottsdanker’s in in- 
vestigations of what she termed 
“intellectual processes” in skilled 
performance. Another paper of 
interest on this topic, but not directly 
related to tracking, is by Leonard 
(1953). 

The studies by Poulton and by 
Gottsdanker have involved the learn- 
ing of prediction during the course of 
actual practice on a tracking task. A 
related line of investigation, which 
has been given some attention in 
another study by Poulton (1957a), 
is the effect of training to predict the 
stimulus source prior to actual motor 
practice in the total tracking task. 
This can be viewed as a part-whole 
transfer of training approach, where 
prediction responses are considered 
part of the response totality in track- 
ing. Granting this, prediction re- 
sponses should be trainable prior to 
whole-task practice and, in being a 
part of the total response complex, 
should have their strength reflected 
in the dependent motor response 
whose proficiency reflects the strength 
of all the response classes in the com- 
plex. This approach is quite similar 
to verbal pretraining methods where 
the operator is required to learn 
verbal responses to task stimuli prior 
to motor responses in the whole task 
(Arnoult, 1957; Goss, 1955). Al- 
though verbal pretraining studies 
have not dealt specifically with the 
problem of prediction responses, they 
are concerned with learned mediating 
responses where response produced 
stimuli are hypothesized to provide 
additional discriminative cues for 
the motor response (Goss, 1955; 
Osgood, 1953). Conceptually there- 
fore, they appear quite similar to 
prediction responses and one might 
hypothesize that an adaptation of 
these same methods can be used for 


prior training in the prediction of 
input events in a tracking task. How- 
ever, with our impoverished knowl- 
edge of the underlying nature of antic- 
ipatory mechanisms, it is plausible 
that prediction has nothing to do 
with mediating responses but, in- 
deed, may be fundamentally a pro- 
prioceptive-oriented phenomenon. Giy- 
ing proprioception a role in anticipatory 
behavior needs only the reasonable 
assumption that motor movements 
are conditioned to traces of proprio- 
ceptive stimuli and that, with prac- 
tice, the occurrence of a proper con- 
figuration of proprioceptive stimuli 
will tend to elicit the next correct 
motor sequence. Certainly this is not 
to deny intellective processes or me- 
diating responses as variables in pre- 
diction, but it does suggest that there 
might be at least two facets that de- 
serve experimental inquiry. ‘‘Predic- 
tion response” is a commonly used 
label for anticipation in this paper 
but eventually it may prove to be a 
poor term if proprioception proves to 
be a paramount influence. The verbal 
pretraining studies throw the balance 
of the explanatory weight at present 
in the direction of mediating re- 
sponses as the basis for anticipation, 
but definitive research on this topic 
remains to be done. 


Characteristics of the Measured 
Motor Response 


The basic nature of the motor 
movement activating the control 
system in a tracking task has been 
the subject of extensive discussion 
and controversy. The issue is whether 
the motor response is a continuous 
function of time or whether it is dis- 
continuous and intermittent. The 
intermittency hypothesis stems from 
arguments that a responding effector 
has a period of refractoriness or re- 
duced excitability before it can be 
made to respond in full strength 
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again. Because this evidence stems 
from molar behavior data, it is called 
psychological refractory phase to 
distinguish it from the physiological 
refractory phase of individual nerve 
fibers. The similarity of psychologi- 
cal refractory phase and physiological 
refractory phase is in terms of re- 
duced responsiveness following stim- 
ulation and response, but the levels 
of analysis of the two classes of phe- 
nomena are so different that it is per- 
haps safest to view them as analogous 
rather than stemming from a com- 
mon underlying process. 

Probably the first statement of 
psychological refractory phase was 
by Telford (1931) who found that 
reaction time to the second of a pair 
of auditory stimuli was lengthened 
if the time spacing of the two stimuli 
was reduced to .50 seconds, and he 
concluded that the subject becomes 
refractory in a manner comparable 
to the refractoriness of isolated nerves. 
Using Telford’s study as a point of 
departure, Vince (1948a, 1948b) asked 
whether refractoriness is present in 
continuous tracking to give the motor 
response an intermittent, impulsive 
quality. She concluded that inter- 
mittent corrections every .50 seconds 
is a basic feature of human tracking 
responses in a manner quite compa- 
rable to Telford’s finding for discrete 
stimuli and a reaction time response. 
If her interpretation is correct, the 
notion of psychological refractory 
phase becomes an important general 
principle. But in criticism of Vince’s 
findings, psychological refractory phase 
refers to the periodicity of motor 
movements and not tracking error. 
Her conclusions were based on track- 
ing error records and periodicities in 
them are not a function of motor 
movements alone but of the difference 
between the output signal generated 
by the motor movements and the in- 
put signal. Periodicities in the error 


function may be correlated with 
periodicities in motor movements 
but they are contaminated by the 
influence of the input signal and are 
not an unequivocal index on which 
to base conclusions about psycho- 
logical refractory phase as a mecha- 
nism for inducing intermittent motor 
corrections. 

With the exception of the foregoing 
studies by Vince, research on motor 
intermittence has been with discrete 
tasks, although many of the investi- 
gators have freely implied the gen- 
erality of the phenomenon to include 
continuous tracking. Mainly, these 
studies conclude that reaction time 
to a second stimulus of a pair will be 
lengthened if the interstimulus inte: 
val is less than .50 seconds. A limit 
to this generalization is that very 
brief interstimulus intervals cause 
the stimuli to be perceived as a sing’? 
entity, with the result that only a 
single response occurs. Vince (1948b, 
1949) and Hick (1948) have both 
used discrete tracking tasks and have 
provided additional corroborative evi- 
dence on psychological refractory 
phase. Craik (1947, 1948) and Wel- 
ford (1952) use these data for theo- 
retical discussions on the generality 
of psychological refractory phase as a 
determinant of intermittency in re- 
sponding. Poulton (1950) criticized 
the tendency to regard the refractory 
interval of .50 seconds as a human 
constant because the quasirandom 
presentation of stimuli did not allow 
the operator to form a proper pre- 
paratory set. When allowance is made 
for the acquisition of a preparatory 
set by having predictable stimuli, 
Poulton found that the refractory 
phase interval reduces to .20—.40 
seconds, Davis (1956) and Elithorn 
and Lawrence (1955) also discuss 
the role of anticipatory set and the 
psychological refractory period. A 
general discussion of research and 
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views on this topic is presented by 
Fitts (1951). 

Another aspect to the intermit- 
tency hypothesis is that the duration 
of patterned movements to discrete 
stimuli can be less than visual-motor 
reaction time. This has implied re- 
sponse discontinuity to some in- 
vestigators because the subject is 
executing response sequences mo- 
mentarily independent of the magni- 
tude of the visually perceived error. 
Because the response is not con- 
tinuously guided by the primary 
visual tracking error quantity, it is, 
for a time, open-loop or intermittent 
(Searle & Taylor, 1948; Taylor & 
Birmingham, 1948). These authors 
conclude that response movements 
are under a kind of a “cam control” 
where the visually perceived error 
triggers a cammed sequence. On the 
basis of past experience the “cam” 
runs off the continuously varying 
force pattern, including starting and 
stopping, and all without visual or 
proprioceptive feedback. 

Admitting the possibility that the 
continuous control of movements 
during an interval less than that re- 
quired for visual-motor reaction time 
can be proprioceptive feedback, 
Chernikoff and Taylor (1952) con- 
ducted a study to see if kinesthetic 
Teaction time was sufficient to ac- 
count for the control of the response. 
They concluded that continuous 
tracking behavior is best described 
by the intermittency hypothesis, 
analogous to cam control where very 

rief movement sequences are run 
off in the absence of visual and pro- 
Prioceptive guidance. Lashley (1951) 
in a parallel line of argument, is in 
agreement that kinesthetic reaction 
time cannot explain many facts of 
Motor responding such as the finger 
movements of a skilled pianist mov- 
ing at about 16 per second. These 
Tates are too fast to allow kinesthetic 


feedback after each one, and Lashley 
postulates that some central sensory 
control is operating, presumably in a 
fashion similar to the cam hypothesis 
stated by Taylor and his associates. 
Craik (1947) holds a similar view. 
Arguing from piano playing to track- 
ing is tenuous however, if for no 
other reason than that a musical 
composition provides foreknowledge 
of a requirement for movement se- 
quences, and reaction time to each 
one is known to be greatly shortened 
under these special conditions (Vince, 
1949). Advance notice of stimuli is 
not a characteristic of tracking tasks. 
Moreover, Poulton’s work (e.g., 
1952b) has shown that learning to 
anticipate stimulus sequences is re- 
vealed in greatly shortened reaction 
time values. It is hardly surprising 
that a trained musician can some- 
times sidestep the restraints of an 
elementary afferent-efferent loop and 
receive guidance from learned, inter- 
nal sources. 

Gibbs (1954a, 1954b) in two im- 
portant papers effectively argues 
against the hypothesis that con- 
tinuous motor movements do not 
have continuous kinesthetic feedback 
guiding them. He points out that 
arguments based on kinesthetic reac- 
tion time fail to distinguish between 
the connecting and conducting func- 
tions of the central nervous system. 
Gibbs observes that kinesthetic reac- 
tion time to discrete stimuli can be 
considered the connecting time be- 
tween kinesthetic stimulation and 
overt motor response, and this has 
little bearing on continuous kines- 
thetic or neural conduction during 
voluntary movement. Gibbs bases 
his discussion on physiological data 
by Matthews (1933) which showed 
that a muscle had “tension” afferents 
and “stretch” afferents which, re- 
spectively, provide sensing of static 
position and of movement of a limb. 


70 JACK A. 


Tension afferents respond primarily 
when the muscle is at rest and has an 
electrical discharge approximately 
proportional to the logarithm of the 
tension. Stretch afferents, on the 
other hand, respond when the muscle 
is stretched in movement and has a 
rate of electrical discharge propor- 
tional to the rate of stretch, and 
Gibbs holds that this is the source of 
continuous kinesthetic feedback mon- 
itoring. The subject must “know” 
limb position in guiding his move- 
ments and Gibbs holds that this is ob- 
tained by integrating the rate func- 
tion. The notion of a finite integra- 
tion period might suggest that Gibbs’ 
hypothesis is essentially the same as 
the intermittency hypothesis be- 
cause successive integrations might 
be revealed as intermittent move- 
ments of .50 seconds as limb position 
is successively “computed.” Ac- 
tually, the implications are quite 
different because Gibbs’ hypothesis 
would seem to hold that there are 
conditions where an integration in- 
terval of .50 seconds would apply 
but that integration intervals of 
longer duration are equally possible. 
Gibbs’ physiological hypothesis 
would seem to allow for perfectly 
smooth tracking movements of rela- 
tively long duration and, indeed, this 
is a common observation in tracking 
records. Oddly enough, relatively 
long periods of smooth responding in 
continuous tracking have not served 
as grounds for seriously challenging 
the intermittency hypothesis. Craik 
(1948) and Noble et al., (1955) re- 
mark on these smooth responses and 
offer the ad hoc explanation that 
intermittent movements are occur- 
ring in accordance with the principle 
of psychological refractory phase but 
that the subject’s acquired capability 
to predict input sequences has over- 
laid a smoothing effect. While pre- 
diction responses may well have some 
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sort of smoothing influence, it also 
may be true that the intermittence 
hypothesis is false for continuous 
tracking and that relatively long, 
smooth responses frequently occur in 
the absence of prediction behavior. 
Gibbs’ work emphasizes the rather 
simple fact that the intermittency 
hypothesis has its validity derived 
from research on discrete tasks and 
its generalization to continuous 
tracking may be inappropriate. 
Gibbs’ use of Matthews’ findings 
raises the interesting idea that pro- 
ficiency in making accurate acclera- 
tions in tracking is related to the 
subject’s ability to discriminate 
changes in the rate of kinesthetic 
impulses. One interpretation of 
Gottsdanker’s findings (1952a, 195':b, 
1955, 1956) that the subject poorly 
predicts velocity changes is that he 
cannot kinesthetically discriminate 
with enough accuracy those velocity 
changes which he visually perceives. 
However, this interpretation must 
be approached cautiously because it 
fails to consider that the inability to 
discriminate velocity changes con- 
ceivably could be on the visual-per- 
ceptual side rather than the kines- 
thetic. To interpret Gottsdanker’s 
data properly we must, by indepen- 
dent operations, determine the rela- 
tive capabilities of perceptual and 
kinesthetic discrimination of accel- 
eration. If the operator cannot per- 
ceptually discriminate the velocity 
changes involved, then the motor 
response system is not receiving ade- 
quate information and the overt re- 
sponse cannot be expected to reflect 
information that has not been re- 
ceived. Or, conversely, the operator 
may be perfectly able to discriminate 
the velocity change perceptually but 
he may be unable to translate it into 
the proper accelerated movement be- 
cause he cannot make sufficiently ac- 
curate kinesthetic discriminations. 
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Some work on the perceptual dis- 
crimination of instantaneous changes 
in velocity has been done by Hick 
(1950) and Brandalise and Gotts- 
danker (1959). Too, this general line 
of reasoning suggests the hypothesis 
that the relative effectiveness of posi- 
tion, rate, and acceleration tracking 
may be related to the compatibility 
of perceptual and kinesthetic events. 


COMPENSATORY TRACKING 


The discussion of research and 
problems under the heading of pur- 
suit tracking applies also to com- 
pensatory tracking. Whatever dif- 
ferences exist are resident in the 
different ways in which the two types 
of tracking tasks have their data 
organized on the display. The pre- 
sentation of only the error quantity 
in compensatory tracking means that 
performance usually will be poorer 
for two reasons: 

1. The operator cannot see the in- 
put signal directly which means that 
he is handicapped in the acquisition 
of prediction responses. 

2. The operator cannot see the 
output signal directly so he is handi- 
capped in receiving knowledge of re- 
sults. In addition to influencing the 
acquisition of simple visual-motor 
learning where prediction behavior is 
absent, this factor also influences the 
acquisition of prediction responses 
because the operator cannot un- 
€quivocally verify the results of any 
particular prediction response. 

Depending upon task circum- 
Stances, some prediction behavior 
can be expected to form under com- 
Pensatory tracking conditions. The 
error signal is a function of both the 
Input and the output signals, and at 
times the regularities in the input 
will be discernible. Poulton (1952b) 
has shown that prediction behavior 
does occur with practice in compensa- 
tory tracking but that prediction is 


impressively superior in pursuit track- 
ing. Undoubtedly this is one of the 
factors which almost always renders 
pursuit tracking superior to com- 
pensatory tracking (Hartman & 
Fitts, 1955; Poulton, 1952b). 

Nor can we assume that the ab- 
sence of a direct presentation of the 
output signal means that knowledge 
of results is completely absent. There 
is evidence from a study by Cherni- 
koff and Taylor (1957) that when the 
input signal of a continuous tracking 
task is a low frequency input the sub- 
ject receives fairly adequate knowl- 
edge of results, probably because the 
motor movements produce output 
frequencies which are higher than the 
input frequency changes. This is de- 
duced from the slightly better per- 
formance that was found in this study 
for compensatory over pursuit track- 
ing when the input was a low fre- 
quency signal. At higher frequencies, 
they found that pursuit tracking 
maintained its well-known advantage 
over compensatory tracking. 


Two-DIMENSIONAL TRACKING 


A two-dimensional tracking task 
has two stimulus sources command- 
ing response, with each source hav- 
ing its own separate input signal and 
a dimension of the control system for 
response. An example of a two-di- 
mensional visual tracking task would 
be two voltmeter stimulus sources 
with a left-hand control lever for re- 
sponse to one and a right-hand lever 
for response to the other. Or, the 
two stimulus sources could have a 
bisensory distribution, with one vis- 
ual and one auditory. Our ignorance 
of variables involved in the various 
ways to organize a two-dimensional 
tracking task dictates that only a 
limited examination of some of the 
issues be made. The discussion will 
be restricted to two cases: spatially 
separated visual sources, and bisen- 
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sory sources where one is auditory 
and the other is visual. Nothing of 
importance is known of the effects of 
control system design as it bears on 
the distribution of the two response 
dimensions among the possible effec- 
tor systems, so it will not be dis- 
cussed. Nor will the relative ad- 
vantages of pursuit and compensa- 
tory displays be discussed for what- 
ever special implications might be 
found for two-dimensional tasks. 
Because almost all of the research 
in tracking has employed one-dimen- 
sional visual tasks, it is unfortunately 
necessary to attempt this preliminary 
discussion of two-dimensional tasks 
on a rather thin foundation of em- 
pirical findings. Perhaps the dearth 
of analytical data on more complex 
tracking tasks is because of the im- 
plicit view of many psychologists 
that it is desirable to progress in re- 
search from simple to complex sys- 
tems, and that the laws of complex 
systems will tend to fall into place 
once the relationships for simpler 
tasks are established. On the other 
hand, it is possible to defend the 
position that parallel law-seeking at 
two levels of analysis will result in 
two bodies of laws, each appropriate 
for its own domain. As these two 
bodies of knowledge develop, specific 
research can then be directed towards 
finding the empirical composition 
laws which express the interactions 
relating the laws of the two strata. 
If this view is allowed, it does not 
seem necessary that the study of 
multidimensional tracking tasks 
should await the codification of laws 
governing one-dimensional tracking. 
To facilitate exposition, the follow- 
ing terminology has been adopted. 
Each stimulus source and its dimen- 
sion of the control system will be 
called a component task of the total 
task. Response in the component 
task will be termed the component 
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response of the total response. As 
before, the observing response will 
serve to orient the receptors to the 
events emitted by the stimulus 
sources. 


Visual Tracking 


Observing response. One of the dis- 
tinguishing features of a two-dimen- 
sional visual tracking task is that 
there is not only the need for scan- 
ning within a source but also the more 
demanding requirement to scan be- 
tween sources. This added response 
requirement is importantly a product 
of the task variable called load (Con- 
rad, 1951, 1955). Load is defined as 
the number of stimulus sources id 
has an expected interaction with “he 
rate of events emitted from each 
source. This latter variable has been 
termed speed (Conrad, 1951, 1954). 
Performance deteriorates both with 
increase in speed and load. More- 
over, it has been shown that response 
proficiency is a function of the extent 
to which events in spatially distrib- 
uted sources overlap in time and com- 
mand two simultaneous responses 
(J. F. Mackworth & N. H. Mack- 
worth, 1956; N. H. Mackworth & 
J. F. Mackworth, 1956, 1957). An- 
other important task variable which 
would certainly interact with speed 
and load in determining the observing 
response is the amount of the spatial 
separation of the sources. Tracking 
proficiency as a function of the 
amount of spatial separation has not 
been systematically studied. 

Prediction responses. An impor- 
tant but unverified implication for 
prediction responses in a two-dimen- 
sional visual task is that they might 
reduce the major requirement for 
visually scanning the stimulus 
sources and improve tracking per- 
formance. Prediction responses in a 
one-dimensional task are known to 
benefit motor performance. We might 
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hypothesize that in a two-dimen- 
sional task there is not only predic- 
tion within each source but also pre- 
diction between-sources. If the hu- 
man operator can learn to predict 
events within a source, it would seem 
that he might learn of the covaria- 
tion between the events of the two 
sources. Given an event in one source 
he would have some likelihood of cor- 
rectly predicting the concurrent 
event in the other source and conse- 
quently would not need to attend 
visually to this source as often. We 
know nothing of these matters but 
between-source prediction is a rea- 
sonable expectation. 

Component response differentiation. 
Two-dimensional tracking often in- 
volves two or more component re- 
sponse effector systems, such as both 
hands or a hand and a foot, and this 
raises the issue of motor interaction 
between the two systems. It is a 
common observation that initial 
Stages of total response in a multi- 
dimensional task are often typified by 
a level of uncoordinated activity and 
error far greater than might be ex- 
pected from low habit strength in 
each component response separately. 
But, as practice proceeds, these in- 
teractions of component responses 
tend to drop out completely or show 
a marked decrease, with each par- 
Ucipating component response effec- 
tor system becoming smoothly pro- 
ficient. This phenomenon shall be 
called component response differen- 
tiation. Within the framework of his 

-R contiguity theory, Guthrie 
(1952) discusses the acquired differ- 
entiation of component responses: 
sane oe a habit to essentials makes 
ESE PA ee no longer va 
mevdrive ana ie se sie pronac 
smoke Be ate a a or Lae the Pie an 
ae: 2 fa ae | greet a riend at the same 
5 is is impossible because driv- 


E playing, skating all include a mass of 
ction that is not essential to the performance 


but is present because it is part of total associ- 
ated complex bound together by conditioning. 
In time, many irrelevant movements are 
dropped out from the complex and the activ- 
ity is limited to the muscles and the move- 
ments required for the performance, This 
process is, of course, never complete. Perfect 
grace, which means the use only of the essen- 
tial muscles and this use only to the point 
necessary for the action, is only approximated, 
never reached (p. 109). 


How component response interac- 
tion is manifested in a two-dimen- 
sional tracking task is not known at 
this time. However, the extensive 
literature on experimentally induced 
muscular tension, which has been 
organized by Meyer (1953) in terms 
of physiological hypotheses, leaves 
little doubt that interaction of simul- 
taneous motor responses occurs. The 
concern of Meyer’s review and anal- 
ysis was the effects of experimentally 
induced muscular tension where us- 
ually a static, muscular tension-in- 
ducing component response accom- 
panies a more central learning ac- 
tivity, such as rotary pursuit or 
paired-associates learning. The ma- 
jor area of interest for two-dimen- 
sional tracking, but where much less 
is known, concerns total tasks where 
all component tasks impose a learn- 
ing requirement on their respective 
component responses. Perhaps, as 
Guthrie suggests, the interaction will 
all but disappear. But until a means 
of defining and measuring the course 
of component response differentia- 
tion in tracking is uncovered, there 
is no reason for discussion beyond 
this passing mention of a potentially 
important area. 


Visual-Auditory Bisensory Tracking 


The major issue for two-dimen- 
sional tracking with one visual and 
one auditory source is whether there 
is interaction which intrinsically pres 
vents the two stimulus event stream- 
from being processed simultaneously, 
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While we might intuitively surmise 
that a total task organized in this 
manner will be superior to two-di- 
mensional visual tracking because 
each stimulus stream, with its own 
sense modality, gives the operator a 
higher load carrying capability in 
that he does not have to time-sample 
the sources with the observing re- 
sponse as he does in two-dimensional 
visual tracking, there is no evidence 
of the conditions for which this can 
be true, if at all. As a first experi- 
mental question it would seem de- 
sirable to attack the pure case in 
two-dimensional bisensory tracking 
and ask whether it is possible to 
simultaneously process two stimulus 
streams without impairing interaction 
effects. Any research program should 
have a strategy which sets up a 
hierarchy of research questions whose 
answers are ordered in terms of their 
contribution to the delineation of 
variables and laws, and in bisensory 
tracking the best strategy is sug- 
gested to be one of first determining 
whether the human operator can 
process two event streams at once. 
Having determined the. empirical 
truth or falsity of this hypothesis, we 
will be in a better position to com- 
paratively examine the relative mer- 
its of all-visual and bisensory tasks. 
Later variables to consider would be 
the differential capabilities of the 
visual and auditory senses for dif- 
ferent classes of stimulus inputs 
(Henneman & Long, 1954). 
Subjectively we all have the con- 
fident feeling that we can handle 
visual and auditory events simul- 
taneously. It is commonplace to en- 
counter the observation that one 
can simultaneously read a book and 
listen to the radio. Yet, as with most 
anecdotal accounts, they may be true 
but the absence of experimental con- 
trols precludes any proof of the 
thesis. Thus, an explanation of these 
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experiences of everyday life is just as 
plausible in terms of rapid sensory 
shifting from one data stream to 
another. Experimentally, the issue 
is a delicate one and will require 
careful analysis and experimentation 
to decide it conclusively. 

The experimental design necessary 
to prove or disprove that the human 
operator is a one-channel system 
must, as a minimum, show that per- 
formance of each component re- 
sponse in a bisensory tracking task 
will, after practice, be the same as 
performance when each component 
task is practiced out of total task 
context as a separate task. But what 
interpretation can be given if com- 
ponent response measures in bisen- 
sory tracking performance fail to 
achieve the level attained on part 
tasks? The hypothesis that the hu- 
man operator is a single channel data 
processing system is supported but 
the investigator is then faced with 
the new question of the locus of the 
interaction. There are at least four 
possibilities which must then be re- 
solved, although it will take some 
ingenuity and analysis to opera- 
tionally differentiate them for lab- 
oratory testing: 

1. The human operator is truly a 
one-channel system and, when two 
units of stimuli arise simultaneously, 
one must be temporarily stored while 
response occurs to the other. At the 
completion of the first response, the 
second stimulus unit is removed from 
central storage and response is made 
to it. 

2. No storage is required. The 
operator is capable of simultaneously 
processing two event streams but 
there is motor interaction which pre- 
vents the two responses from simul- 
taneously occurring with the same 
effectiveness that would be observed 
for any one of them separately. In 
effect, this hypothesis is consistent 
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with Guthrie's position that there is 
always some interaction between 
simultaneously functioning response 
systems, even after very large 
amounts of practice. Component re- 
sponse differentiation is never com- 
plete. 

3. No storage is required and there 
is no interaction of responses at the 
motor level. However, there is sen- 
sory interaction which results in a 
degradation in performance that 
would be absent if only one stream of 
stimuli were being handled. Evi- 
dence for sensory interaction is pre- 
sented in a number of papers (Child 
& Wendt, 1938; Gilbert, 1941; Gregg 
& Brogden, 1952; Hartman, 1934; 
London, 1954; Ryan, 1940). 

4. Combinations of the above 
three possibilities. 

- The most relevant research on 
simultaneous bisensory data process- 
ing for tracking is by Davis (1957). 
While he did not study tracking or 
even strict simultaneity of bisensory 
events, he did study the effects of 
very small time intervals between a 
visual and an auditory stimulus and 
the experiment makes a significant 
Contribution to the topic. Following 
the generalizations on psychological 
tefractory phase, Davis asked 
whether the operator is refractory if 
the second of two successive stimuli 
impinges on a different sense modal- 
ity than the first. Using the reaction 
time response and stimuli of very 
brief duration, Davis found that the 
reaction time to the second signal in- 
creased as the interstimulus interval 
decreased. The data show that the 
Phenomenon which has come to be 
known as psychological refractory 
Phase operates for two successive 
stimuli in two sense modalities about 
as it does for two successive events 
in a single sense modality. In some 
ashion, a “queuing of signals,” to 
Use the engaging phrase of Davis, 


occurs whether stimuli arrive over 
one or two sense channels. Davis 
finds his data consistent with a model 
of the human operator as a single 
channel information system. If we 
can assume that the processing of 
simultaneous events is a special zero- 
interval case of intervals for succes- 
sive stimuli, the extrapolation of the 
Davis findings to the zero interstimu- 
lus interval suggests a substantial 
impairment in performance. An em- 
pirical study of truly simultaneous 
events must be done but the Davis 
experiment is unquestionably pro- 
vocative on the simultaneity issue. 


SUMMARY AND CONCLUSIONS 


This paper has reviewed some of 
the major issues and problems in the 
study of human tracking behavior. 
Apart from the complexities that are 
inherent in the analysis of closed-loop 
behavior, which is somewhat more 
complicated than the open-loop sit- 
uations used by most psychologists 
in their studies of human behavior, 
tracking behavior is beset with the 
added complications of mediating re- 
sponses and stimuli which are im- 
portant variables intervening be- 
tween the display and the measured 
motor response. Moreover, all of 
these variables assume further com- 
plications when they are cast in the 
matrix of multidimensional tracking 
tasks with two or more stimulus 
sources, each with a corresponding 
dimension of the control system for 
response to them. And, not only do 
multidimensional tasks have com- 
plications resulting from a compound- 
ing of the effects of variables found 
in one-dimensional tracking, but 
they have the added issues of how 
one or more sense modalities process 
the incoming data and how the 
component response systems interact 
throughout learning to become partly 
or completely noninteractive (differ- 
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entiated). We appear to be a long 
way from understanding these fac- 
tors and, until we do, we are a long 
way from the beginnings of any kind 
of theory of tracking. British re- 
search has been most influential in 
illuminating the characteristics of 
tracking behavior, with its experi- 
mental examination of what is 
learned (e.g., prediction behavior in 
tracking), and its study of the inter- 
mittency hypothesis. This approach 
of British investigators would seem 
to be mandatory for our eventual 
theoretical description of tracking, 
and is in some contrast to the ap- 
proach of engineering psychologists 
in the United States who tend to em- 
phasize measures of tracking be- 
havior as a function of task variables 
and often bypass detailed analyses of 
the learned behavior. Some im- 
portant exceptions to this emphasis 
on the domestic scene has been the 
early work of the Naval Research 
Laboratory, Gottsdanker, and recent 
work by Briggs and his associates on 
learning and transfer as a function of 
task variables. 

If this paper can be said to have a 
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point of view, it is that tracking re- 
search is in need of a rapprochement 
of the interests of the engineering 
psychologist, with his focus on task 
variables and the measurement of 
time-based behavior, and interests of 
the traditional experimental psy- 
chologist who tends to emphasize be- 
havior as a function of variables 
which determine conceptual states 
such as habit, work inhibition, mo- 
tivation, mediating responses, etc. 
Physical servotheory has been a 
prominent attempt in engineering 
psychology to describe tracking be- 
havior, but the absence of variables 
defining conceptual states long 
known to influence behavior elimi- 
nates it as a psychological theory of 
any stature, quite apart from its for- 
mal shortcomings for the description 
of nonlinear human behavior. It is 
unlikely that a theory of tracking be- 
havior will emerge until these con- 
ceptual variables are included, along 
with time series measurement and 
task variables which traditionally 
have occupied engineering psychol- 
ogy. 
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The purposes of this paper are to 
consider some difficulties involved in 
matching problems with multiple 
judges and objects, and to present 
some appropriate techniques for the 
analysis of such data. The matching 
problem is the problem of evaluating 
the accuracy of a set of judgments 
about a series of objects. In the usual 
form of the problem, a judge places 
each object into one of several speci- 
fied and unordered categories. Since 
the number of categories is finite and 
is ordinarily small, each such set of 
judgments has an appreciable prob- 
ability of occurrence by chance. It 
will be obvious that there is always 
an external or a priori criterion for 
scoring each judgment as correct or 
incorrect. As Mosteller and Bush 
indicate (1954, pp. 307-308), the 
matching problem is present in many 
apparently different designs which 
call for identifying, diagnosing, or 
otherwise classifying objects, persons, 
or responses. Examples are guessing 
the order of a deck of ESP cards, 
diagnosing clinical cases, and identi- 
fying the products of each of several 
designated persons. 

The several questions asked by the 
experimenter may include: Can these 
objects be classified by these judges 
with better-than-chance success? If 
so, are some judges more successful 
than others, and are some objects 
more successfully classified than other 
objects? It will be seen that these 


1 I am indebted to many colleagues for their 
criticisms and suggestions on early drafts of 
this paper, especially: Desmond S. Cart- 
wright, Lee J. Cronbach, Lyle V. Jones, Jack 
Sawyer, Charles van Buskirk, and Virgil 
Willis. 
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questions are not specific to the match- 
ing problem: e.g., they may also be 
asked in the analysis of responses to 
an aptitude test. What is distinctive 
about the matching problem is that 
the number of categories is usually 
larger than two, that each is of equal 
interest (as contrasted to the usual 
psychometric preoccupation with 
“right” answers), and that the num- 
ber of objects classified by each judge 
is usually small. When the judge 
makes judgments about many ob- 
jects in each of the true categories, 
psychometric scoring methods pro- 
vide scores that can be analyzed by 
familiar statistical techniques. 

The matching problem takes sev- 
eral forms, depending primarily on 
the number of categories into which 
the objects fall. The number may be 
from two to O (the number of ob- 
jects). Another variable in such 
designs is the judge’s information 
concerning the distribution of cases 
over categories: e.g., he may have 
some prior knowledge which would 
constrain his judgments, or he may be 
told how many objects fall in each 
category; if the categories are male 
and female, he would put approxi- 
mately 50% of the objects in each 
category, or he might be informed 
that exactly 50% were of each sex. 
For brevity, this paper will be limited 
to the general form of the problem in 
which the instructions do not fix the 
distribution of judgments and in 
which the objects are from a specified 
class. Thus the task may be to judge 
whether or not each object has a cer- 
tain property, or the analysis may be 
limited to evaluating the accuracy of 
judgments about objects in a give? 


category, these having been pre- 
gented along with objects from other 


« p 
categories. 
N 


THE ONE-VARIABLE CASE 


Consider the limited case in which 
one judge, j, makes judgments about 
each of a set of O randomly selected 
objects. The experimental question 
is whether his success is significantly 
greater or less than that which would 
be expected from chance matching 
(due to his ability, his biases, sys- 
tematic errors, etc.). Statistical 
techniques have been developed and 
presented by various writers. Mostel- 

Ver and Bush (1954) give the most 
comprehensive account. Also perti- 
‘Went are papers by Chapman (1934, 
1935, 1936), Dudek (1952), McHugh 
and Apostolakos (1959), Roberts 
(1958), and Vernon (1936a, 1936b, 
1936c). Mathematical treatments 
} d further references are given by 
Battin (1942), Cochran (1950), Gil- 
b (1956), Stevens (1938), and 
ilks (1943, pp. 208-213). 
The conclusions from this design 
viously apply only to Judge j and 
Me population from which the O ob- 
~ were drawn. Such a study would 
c% of value if positive results per- 
J Mitted the conclusion that there is at 
one judge who can correctly 
judge objects of this type. Negative 
| Tesults would have no value unless 
ere were something distinctive 
about the one judge. 
tn Parallel fashion, one object, 9, 
ie judged by a set of J judges, 
Ea aa conclusions applying to that 
whi A sa and the population from 
a the judges were drawn. While 
Shil an experiment would be worth- 
ile if the object had special signifi- 
Bie inability to generalize to other 
; a would usually make the study 
: e value. 
aa necessity for representative 
pa S'S in studies of this sort has 
n pointed out by various writers 
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(Crow, 1954, 1957; Hammond, 1954; 
Secord, 1952). (For a pertinent dis- 
cussion of the problems of interpret- 
ing the results of matching studies, 
see Cronbach, 1948.) 


THE TWo-VARIABLE CASE 


The one-variable case can be repl- 
cated with J different judges, each 
being assigned a random sample of 
objects, the several samples being ob- 
tained independently. Ifa p value is 
obtained for each judge, the findings 
for the several judges can be pooled 
(e.g., through the chi square trans- 
formation—see Jones & Fiske, 1953). 
If the objects are randomly chosen, 
one can make inferences about judg- 
ments of objects in the population 
from which they were drawn. If the 
judges are also randomly drawn, 
inferences can be made to the popula- 
tion from which they came. Other- 
wise, the inferences must be limited 
to the particular objects or the par- 
ticular judges, respectively. This 
design is suitable for testing for non- 
randomness of judgments, but does 
not permit a comparison between ob- 
jects. It is not optimal for testing 
for differences between judges since 
differences between samples of ob- 
jects would contribute to apparent 
differences between judges. 

In another design, the 0 objects are 
randomly assigned to the J judges 
(J=0), each judge making one judg- 
ment and each object being judged 
once. This appears to be an excellent 
design for testing whether judges of a 
certain kind can judge correctly 
about objects of a specified type: a 
given amount of judge effort is spread 
over the largest possible number of 
objects so that the errors of sampling 
judges and objects would tend to be 
minimized. The resulting data can be 
analyzed by an appropriate statistical 
test from those in the references cited 
above: e.g., a test to determine 
whether the obtained proportion of 
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hits departs significantly from that 
expected on the basis of chance. 

Such a design is ordinarily not used 
because it does not permit analysis of 
judge differences, object differences, 
or judge-object interactions. It is 
also not economical of the experi- 
menter’s effort insofar as each judge 
must be instructed or trained and 
each object must be prepared for 
presentation to the judges. 

In a more common design, each 
judge judges each object. The re- 
sulting data can be recorded in a 
bivariate table, the rows indicating 
objects, the columns the judges, and 
the entry indicating success or failure 
(e.g., 1 or 0). 

But the several observations in 
each column may not be independent 
of each other, and neither are those in 
each row—just as, in the restricted 
case, the results apply only to the one 
judge or to the one object. Therefore, 
the set of JO observations cannot be 
treated as independent. To test the 
departure of the grand mean of the 
JO observations from chance expect- 
ancy, an error term is needed. The 
variance of the JO distribution is un- 
satisfactory because its actual de- 
grees of freedom are not known. This 
dependence among observationsis the 
crucial problem in most matching 
experiments. 

This critical problem has been seri- 
ously slighted or overlooked in previ- 
ous work. Mosteller and Bush (1954, 
p. 311) combine results for several 
judges on one set of objects without 
reference to the consequent restric- 
tion on the conclusion. Vernon 
(1936c) pointed out that significant 
differences between judges or between 
objects would introduce a marked 
bias in his method for analyzing 
matching data. He therefore pro- 
posed that, where such differences 
are known or suspected, the data for 
the average judge (or average object) 
be the basis for inference. This sug- 

gestion would seem to involve throw- 


ing out most of the available informa- 
tion and to increase the danger of 
making a Type II error. For example, 
suppose that each judge does better 
than chance but the data for the 
average judge does not reach the 
selected level of significance; the re- 
sults for the several judges taken to- 
gether might still attain significance, 

When each judge judges each ob- 
ject only once, there is no satisfactory 
direct test of the observations as a 
totality. If the nature of the material 
permits each judge to make a number 
of judgments about each object, each 
judgment being truly independent of 
all previous judgments for that ob- 
ject, his score for that object can be 
stated as a proportion and an overall 
test could be developed on the basis 
of the discrepancies between these 
proportions and those predicted from 
the proportions for the given row and 
column. It is ordinarily not possible 
to obtain such a series of independent 
judgments. 

However, indirect tests using the 
judge or object means can be made. 
The experimenter can test whether 
the overall proportions for the several 
objects (each proportion being the 
mean success per object) differ from 
chance. He can also make a test of 
the set of mean successes per judge. 
These will be considered below. It 
should be noted here that while these 
two tests are of the same grand mean, 
they may lead to different conclusions 
because the variance of the judge 
means is ordinarily different from 
that of the object means. 

There is a special case of this mul- 
tiple judge and multiple object de 
sign in which one assumption of ran- 
dom sampling is not made: examples 
are studies of particular judges wh? 
are distinguished by their obvious 
expertness, or studies of a finite an 
small class of objects of special inter- 
est. Here the experimental question 
concerns the performance of these 
judges or judgability of these objects: 
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Their role in the design is analogous 
to that of the fixed constants in one 
model for analysis of variance, or to 
the operations or instrument for 
measurement in research in general. 
The conclusions are limited to the 
instrument since no random sampling 
is involved in the selection. 

In such designs, the appropriate 
procedure is to obtain a score for each 
application of the instrument, and to 
test the mean of these scores. For 
example, one might determine 
whether a group of experts can make 
a blind diagnosis of each of a random 
set of cases, the score being the num- 
ber of experts who correctly diag- 
nosed each case. The mean number 
right would be compared with the 
mean expected on a chance basis (as 
derived from the actual distribution 
of judgments over categories), the 
error term being based on the distribu- 
tion of scores for the several cases. 

An alternative technique is to de- 
termine a p value for each case, and 
then combine the set of such values 
for the whole set of cases. The ex- 
pected proportion is the proportion 
of all judgments which fall in the 
category to which the case belongs. 
Given this value, the probability of 
obtaining or exceeding the observed 
Number of successful judgments for 
the case can be found in tables for 
the binomial distribution. The p 
values for the several cases can be 
Pooled by the chi square transforma- 
tion of the several p values (Jones & 
Fiske, 1953). This method assumes 
that the degree of success of the 
judges on one case does not affect 
the degree of success on any other 
case, 

A refined method for evaluating 
pie performance of each judge has 
C suggested to the author by LeeJ. 
ee in a personal communica- 
Ree e set of O judgments made 
By ae judge can be scored in the 

sual way by counting the number of 
its or correspondences with the 


criterion classifications or identifica- 
tions. Then this same set of judg- 
ments by this j, treated as a distribu- 
tion, is randomly assigned to the sev- 
eral o’s, and the number of hits noted. 
This process is repeated for a number 
of trials sufficiently large to provide 
a sampling distribution from which 
one can estimate the chance proba- 
bility of obtaining the actual number 
of hits earned by this j. This method 
takes into account the judge’s biases 
or preferences for certain categories. 

For an oversimplified illustration, 
suppose that four o’s have the actual 
classification of A, B, B, C, and sup- 
pose j judges them to be A, C, B, A, 
respectively, giving him two hits. 
Since the number of o's is small, one 
can in this case determine the exact 
probability of two or more hits by 
comparing with the criterion order 
(A, B, B, C) each of the 12 possible 
orders of two A’s, one B, and one C. 
We find that 4 of the 12 yield two or 
more hits and hence the probability 
of this judge making two or more hits 
is .33. This probability is lower than 
the expected probability based on 
the assumption that the probability 
is } of a hit on each o. 

The same method could be applied 
to each object, with the resulting 
values of p for the O objects being 
evaluated as a set. This would ordi- 
narily be less appropriate than the 
approach based on the judges, be- 
cause it is known that individual 
judges have response sets and other 
biases which should be taken into ac- 
count. The extent to which judg- 
ments about a particular o are biased 
will probably be of smaller magnitude 
and will typically be of lesser interest. 
The selection of approach should be 
based primarily on one’s objective: 
if the objects are viewed as the instru- 
ment tested, p values are obtained 
for judges and an inference is made 
about the sampled population of 
judges. 

This method has the same type of 
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rationale as that employed by Crow 
(1954) to evaluate judges’ predictions 
of the responses of each of a set of 
patients. In this Random Compari- 
son Method of Chance Determina- 
tion, the distribution of predictions 
for all judges on all objects was com- 
pared to the actual behavior of each 
object and the resulting sets of dis- 
crepancy scores were pooled. It was 
then possible to determine for how 
many of the objects the given judge 
had done better than the median for 
all possible judge-object pairings. 
(For a study comparing correspond- 
ence between two records from each 
case with correspondence between 
paired records from different cases, 
see Kelly & Fiske, 1951, pp. 135- 
140.) 

Up to this point, only designs for 
testing the obtained mean success 
have been considered. It should be 
noted that such means may be signifi- 
cantly below (as well as above) the 
value expected from chance matching 
due to systematic errors in judgment. 
A complete analysis of such data 
would also test whether the variance 
of the obtained scores was signifi- 
cantly different from that expected 
with chance matching, since signifi- 
cant differences between judges or 
between objects, or significant judge- 
object interaction may be present, 
even when the mean is not signifi- 
cantly different from chance. 

It must be emphasized again that 
in all matching studies involving 
multiple judgments about a set of 
objects, the several observations can- 
not be treated as independent and 
the experimental design must take 
explicit cognizance of this restricting 
condition. 


Testing for Differences between Judges 
or between Objects 

The question of differences between 
the performances of judges or differ- 
ences between objects in the ease with 


which they are correctly judged is 
often of interest, even when the non- 
randomness of the judgments has not 
been tested. If nonrandomness has 
been tested, the question of individ- 
ual differences may be of interest 
even when the mean does not depart 
significantly from chance. 

Such testing for differences would 
be straightforward if the observations 
were a continuous variable: e.g., if 
each object could be judged at several 
different times by each judge. When 
the data are recorded as discrete 
observations (1 or 0), techniques such 
as conventional analysis of variance 
are not suitable. 

One can, however, use the analysis 
of variance of ranks or W, the co- 
efficient of concordance (Kendall, 
1948). To test for differences between 
objects, we would rank the O values 
for each judge—i.e., we would assign 
the appropriate (tied) rank to all 
successes and similarly with the fail- 
ures, making the appropriate adjust- 
ment for the ties. As in all designs 
where the objects are judged py all 
judges, it would be necessary to have 
the order of judgment randomized 
to control for effects of position on 
success. 

Also appropriate is the chi square 
test developed by Cochran (1950), 
which is available in Siegel (1956). 
This is a generalization of the test for 
two related groups that has been 
offered by McNemar (1955, pp. 228- 
231). Empirical examples indicate 
that, when the observations take the 
values 1 or 0, Cochran’s statistic, Q, 
gives essentially identical results with 
the chi square for ranked data that is 
related to the coefficient of concord- 
ance, provided the correction for ties 
is made. 

One caution should be stated. As 
Cochran (1950) points out for his 
statistic, rows with identical entries 
(all successes or all failures) have no 
effect on the Q for columns, @Q is 
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equivalent to the chi square from 
ranks only when the computation of 
the latter statistic omits such rows. 
This omission of arrays without 
variance, comparable to the exclusion 
of ties in the sign test, may influence 
the experimenter's interpretation 
when a substantial proportion of the 
rows is involved. 

The row means are taken as fixed 
in Cochran’s test, and it would appear 
that the same holds for the chi square 
from ranks. For example, a different 
set of row means might involve a 
greater or smaller number of rows 
with identical values. In general, 
variances of row and column means 
tend to be negatively related: if the 
effect for objects is large, it may pre- 
clude the possibility of a significant 
effect for judges. Therefore, the in- 
ference from such data must be re- 
Stricted to populations with distribu- 
tions of row means similar to that for 
the data at hand. 

When the experiment involves re- 
peated judgments by each judge 
about each object, or when each judge 
judges several objects of each of sev- 
eral kinds, the entries can be propor- 
tions rather than 1 or 0. Under these 
Conditions, it is possible to control 
Column effects while testing for row 
effects. Mood (1950, pp. 399-402) 
peut two methods, an exact test 
ane small Ns and a test based on chi 
fee They require the assumption 
Shet T interactions are Zero, unless 
SR e judges or the object classes 

een randomly chosen. 


THE MULTIPLE CATEGORY CASE 


a pecan presentation has 
ae . the categories to which the 
ae s belong. In most designs, there 
oo €veral categories with several 
? re in each. The same principles 
E. Us and inference apply to this 
m 3 n a design where the same 
EE or judges are used for all ob- 

» the conclusions are limited to 


the judge(s) as an instrument. The 
accuracy of judgments should be 
evaluated for each category sepa- 
rately. This can be done by obtaining 
a score for each case: e.g., the proba- 
bility of 2 judges being correct with 
the chance value being determined 
from the relative frequency of the 
category in the obtained judgments. 
When the relative accuracy for one 
case can be assumed to have no effect 
on the accuracy for any other case, 
these p values can be pooled. 

The multiple category design has 
one type of dependence not found in 
the simpler designs: the relative ac- 
curacy for one category may affect 
that for other categories. In the ex- 
treme case of two categories, A and 
B, the experimenter cannot distin- 
guish between success in judging 
cases as A and success in judging 
cases as not-B, and conversely for B 
and not-A; hence no inference is 
possible concerning the relative suc- 
cess for the two categories. 

Whenever the number of categories 
is small, some interdependence will 
be present and any tests of differen- 
tial success for the several categories 
will be biased in favor of accepting 
the hypothesis of no difference unless 
this dependence is taken into account. 
However, an approximate test might 
be a one-way analysis of variance 
where each entry represented the 
difference, for a single case, between 
the obtained proportion and the ex- 
pected proportion of correct judg- 
ments. (An index of this variety, 
suggested by David Wallace, was 
used in a complex matching study by 
Henry & Farley, 1959.) 


SUMMARY 


This paper considers some prob- 
lems of design and inference that are 
found in studies using the matching 
method with multiple judges and ob- 
jects. When each object is judged by 
each judge, the analysis must take 
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into account the dependence among 
the several observations in each set. 
Failure to recognize and allow for this 
dependence is a common oversight 
in the design of matching studies. 
Frequently the judges or the objects 
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S In a recent issue of the Psycho- 
logica! Bulletin Chase (1960) dis- 
Veussed an article by Murstein and 
Pryer (1959). Referring to this 
Tarticle Chase stated that there appear 
to be “rather glaring faults in formu- 
ition, categorization and definition”? 
289). It would seem, however, 
Chase misses the mark in two 
portant respects: 
There is a gross confusion in 
i what Pryer and I say 
Wi projz tion in the critique por- 
NM of our paper from what we report 
s as saying in our review of the 


T2. There are several misinterpreta- 
Ons of psychological termsin Chase’s 
ficism. 
We defined “attributive” projec- 
mas “The ascribing of one’s own 
vations, feelings and behavior to 
tr persons” (Murstein & Pryer, 
59, p. 354). Chase (1960), finding 
§ definition similar to the one listed 
PEnglish and English (1958), de- 
Abes our definition as “clearly re- 
indant’’ (p. 289). It should be 
ited, however, that the purpose of a 
few is not to invent new defini- 
» but to classify and order the 
ad research publications bearing 
"9M the topic under consideration. 
_ al fact that our definition overlaps 
~ With one given by English and Eng- 


"T should like to express my appreciation to 
Nelson Pareis and Martha Pareis for 
Ging the manuscript and offering their 
ble comments. 

senior author, I assume full responsi- 
ty for all “glaring faults in formulation, 
Orization and definition” attributed to 
t paper by Chase and accordingly am re- 
ying to his comments. 


COMMENT ON “A NOTE ON PROJECTION” 


BERNARD I. MURSTEIN! 
Interfaith Counseling Center, Portland, Oregon 


lish indicates only that we have suc- 
ceeded in identifying one of the com- 
mon usages of the term. 

We are further belabored by Chase 
for our putative disregard of the 
“unconscious” and the “self-concept” 
in our discussion of “attributive” 
projection. Again, there is a failure to 
understand that we did not advocate 
this omission but only reported a 
condition familiar to most persons 
conversant with the literature on 
projection—namely, that most opera- 
tional usages of the term imply noth- 
ing about an unconscious or a self- 
concept. A typical operational defini- 
tion is described by Bender and 
Hastorf (1953). A statement “I am 
wary about the trustworthiness of 
persons whom I do not know well" 
may be answered affirmatively by a 
subject (S). If he now predicts that 
his friend would answer the item 
similarly we have an example of 
attributive projection. The con- 
gruency of answers, however, may be 
based on the similarity of the per- 
sonalities answering habits, experi- 
ences, or cognitive evaluations of the 
two Ss without implying any notions 
about the unconscious or self-con- 
cept. It is exactly this lack of con- 
cern with the self-concept which led 
Pryer and me to criticize this ap- 
proach! S 

Chase criticizes our use of termi- 
nology. Only the most meaningful of 
these comments will be considered. 

1. We quoted Zilboorg’s example 
(1935) of medieval projection which 
Chase regards as an example of an 
hallucination rather than of projec- 
tion. He failed to realize that an 
hallucination is but one kind of pro- 
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jection. Thus, Murray (1933) says: 


We may speak of perceptive projection when 
sensory elements are projected, i.e., when an 
image takes on the vividness, substantiality, 
and out-thereness of a real object—as in... 
an hallucination (p. 313). 


2. Chase believes that our term 
“rationalized projection” is simply 
another name for rationalization. 
Rationalized projection refers to an 
occasion in which S while not deny- 
ing the possession of an unattractive 
trait or the fact that he committed 
some unsavory act does deny re- 
sponsibility by projecting the cause 
of his behavior on to another. Ra- 
tionalization is defined by English 
and English (1958) as 


the process of giving rational order or inter- 
pretation to what was previously merely a 
vague intuition, or was chaotic and confused 
(p. 438). 


Since rationalization may serve to 
make an event comprehensible with- 
out necessarily involving recourse to 
self-deception, it is apparent that 
rationalized projection is but one of 
many kinds of rationalization. 

3. A similar confusion of species 
with genus underlies Chase’s attempt 
to equate ‘autistic projection” with 
autism. The former is a term used to 
describe misperceptions due to hun- 
ger, thirst, or a “set” of some kind. 
The latter is defined by English and 
English (1958) in one of their defini- 
tions as 
finding pleasure in fantasies that represent 
reality in wish-fulfilling terms, even when 
these are not believed (p. 54). 
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It is evident that autistic projection 
is one form of autistic expression. 

4. In an attempt to encompass the 
extremely diverse and varied uses of 
projection, Pryer and I settled for a 
broad definition of it which included 
emotional value or need. Chase inter- 
prets this as subsuming ‘‘defecation.” 
It is, however, unusual to regard this 
act as involving an emotional value 
for the majority of persons, Freud 
notwithstanding. 

Finally, a new classification of 
types of projection is offered by 
Chase (1959). 

Two major categoriesare immediately obvious. 


We might term one type defensive projection 
and the other predictive projection (p. 289). 


I should like to illustrate via a brief 
example why this division is not satis- 
factory. On the eve of election day in 
1948, a noted commentator, H. V. 
Kaltenborn, predicted a Dewey vic- 
tory. Far into the night, as the stun- 
ning reversal of expectation became 
manifest, Kaltenborn relied on his 
long experience to avoid being swayed 
by “early city returns” which favored 
Truman. Surely this is a bona fide 
case of predictive projection, but, 
does it not also smack of defensive 
projection? 

Though I question the validity of 
Chase’s criticisms there is no doubt 
that a good deal of work still remains 
in the area of projection. Hopefully, 
we will yet evolve an operational 
definition retaining the historical 
meaning, which can also be experi- 
mentally validated. 
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In various reviews of the literature, 
psychologists have stressed the de- 
pendence of the perception of motion 
upon a multitude of factors. Ken- 
nedy (1936), for example, indicated 
in his review that this dependence 
necessitates rigorous control in the 
experimental method used for meas- 
uring thresholds. The need for careful 
analysis and experimentation, also 
Stressed earlier by Neff (1936), has 
been restated more recently by Gra- 
ham (1951) and Gibson (1958). De- 
Spite the caution suggested by these 
reviews, analysis of data available in 
the literature for a specific threshold 
Proves fruitful for application to a 
More general form of behavior. The 
Purpose of the present paper is to 
discuss this analysis. 

Visual sensitivity to differences in 
velocity is commonly measured by 
Presenting two objects which move 
at slightly different, but constant, 
oan _ The least detectable dif- 
A ce in speed is the differential 
threshold for the magnitude of veloc- 
ity. As an initial step in the paper, 
consideration of angular speed indi- 
cates that it is the basic unit of meas- 
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urement involved in studies of the 
differential threshold. Plotting dif- 
ferential thresholds for angular speed 
yields a meaningful relation to a 
primary variable, the speed of object 
motion. From these thresholds, the 
sensitivity is readily calculated and 
expressed in terms of the ratio of the 
threshold to the speed. As a final 
step in the paper, this Weber ratio 
for velocity is applied to tracking and 
other predictive behavior. 


DIFFERENTIAL SPEED THRESHOLDS 


Augular Speed 

Graham (1951) has described the 
concept of visual angle and the util- 
ity of specifying stimulus extents in 
terms of the angle they subtend at 
the eye. Similarly, the visual angle 
per unit time or angular speed is a 
basic variable in experiments con- 
cerned with the visual perception of 
movement. Its use facilitates the 
comparison of data obtained under 
different conditions. For example, 
threshold measurements made in in- 
dependent experiments at varying 
observational distances are expressed 
in terms of a common measure, angu- 
lar speed. In addition, the use of 
angular speed as a stimulus specifica- 
tion may be necessary for good ex- 
perimental design. 

In Figure 1, the axis of rotation at 
O may be specified in terms of a 
convenient reference point such as 
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B' where @ is the angle swept by the 


B ot radius vector r in time £. The value 
| of @ is given by: 
1 
s 
1d ĝ=— (in radians) [2] 
r 
i or 
! Sya i 
A g=—— (in degrees) [3] 
r 
A 


Fic. 1. Diagram representing components in 
the measurement of angular speed. 


the front surface of the cornea. The 
radius of rotation (r) is given by the 
distance from the reference point 
to the appropriate moving object. 
When the eye looks steadily at fixa- 
tion point A, the line of regard OA is 
stationary. Alternatively, one may 
assume a rotating line of regard in 
experiments involving fixation on a 
moving object. Presently available 
data do not indicate unequivocally 
that the alternative assumptions 
yield a measurable difference in the 
perception of velocity. Fleischl 
(1882) reported that an object seen 
while fixating a stationary point 
moves subjectively faster than when 
followed by the eyes. Since Aubert 
(1886) confirmed the phenomenon, 
it has been called the Aubert-Fleischl 
paradox. However, the need for re- 
examination of the paradox is in- 
dicated by the recent work of Gibson, 
Smith, Steinschneider, and Johnson 
(1957). When they measured the ac- 
curacy of visual perception of motion, 
they found no difference for the two 
modes of observation. 

As a stimulus rotates about the 
reference point in Figure 1, its in- 
stantaneous angular speed (w) is 
given by: 

do 


w=— 
dt 


[1] 


The measure angular speed may be 
used advantageously not only for 
rotational motion but also for tan- 
gential motion. In Figure 1, the 
rectilinear distance d is a close ap 
proximation to the arc s for angular 
displacements of the magnitude ust 
ally used. For example, d exceeds 
s by only 1% for a @ of 10°. Con- 


versely, angular displacements less 


than 10° may be calculyted with 
less than 1% error by substituting d 
for s in Equation 3. For greater dis- 
placements, @ is calculated from: 


(4) 


6=arc tan — 
r 


For uniform angular motion when 
w is constant: 


vaka - [5] 
t 


Although this equation is a special 
case of the earlier definition of in- 
stantaneous angular speed in deriva- 
tive form, it applies with very few 
exceptions to experiments which have 
been conducted on the perception © 
movement. By substitution for 
from Equation 2, uniform angulat 
speed may be described by: 


Oba 5 nike [6] 
C, (in radians per unit time) 

Ti È 
where the arc s and the radius ” are 
expressed in the same units. As a 
approximation for small angular dis- 


« OF 
placements, we may substitute d for 


$ to obtain: 
ç d aip 
Srp PETIT 
i Sa (in radians per unit time) [7] 
. a 
—. 57.30 
Ma i i, be 
Vet r 


(in degrees per unit time) [8] 


P AE the uniform linear speed v and 
e Observational distance r are ex- 
pressed in consistent units. 


The Differential Speed T. 
ee, hreshold and 
Its Measurement i : 


or differential threshold for angu- 
’ Ai ’ i 
sos of: w, may be defined in 


Aw=w:— W [9] 


ee is a uniform angular speed 
Bs 5 rver discriminates according 
sos specified criterion from the con- 
tant rate of motion wı. In measure- 
ents of Aw, the spatial relationship 
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(a) SEPARATE 


(c) SUPERIMPOSED 


Fic. 2. Procedures used in presentations of stimulus motion. 


of w and wx is critical. Three proce- 
dures used to date involve stimuli 
which are separate, adjacent, and 
superimposed. In Figure 2, a circle 
represents schematically an outline 
of a display, such as moving belt, 
rotating disc, or cathode-ray tube, 
used in presenting wı and ws The 
speeds are represented by the vectors 
in each display. In Procedure a, the 
stimuli for the two speeds are 
spatially apart and are viewed by 
looking from one display to the other. 
In Procedure b, the stimuli are in 
immediate proximity. In Procedure 
c, they are superimposed on each 
other. Table 1 summarizes the most 
significant stimulus conditions pres- 
ent in measurements of Aw. 

At least six experiments have been 
reported for measurements involving 
separate stimuli. Bourdon (1902) 
utilized two rotating white discs with 
a black rectangle on the edge of each. 
The subject adjusted the speed of one 
in increments until it was noticeably 
faster than the other. Similar meas- 
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TABLE 1 
STIMULUS CONDITIONS PRESENT IN MEASUREMENTS OF Aw 
: ; Observa- 
Spatial Stimul Diad pet, tio 
Experimenters Relation ulus Stimulus Objects ; Extent i 
obi ond ext T Oe Motion (degrees) Distance 
Bourdon (1902) Separate Repetitive Black rectangle on Circular 6.4 200 
edge of 2 white discs 
Brown (1931) Separate Repetitive Black square on Rectilinear up- 2.15-4.30 200 
white paper ward 
Brown & Mize (1932) Separate Repetitive Black square on Rectilinearup- 2.15-4.30 200 
white paper ward 
Zegers (1948) Superimposed Single 2 needles perpendic- Rectilinear to S's 3.6 -15.0 15.9 © 
ular to line of sight right 
Hick (1950) Adjacent Single Spot on oscilloscope ee r to S's 4.8 53.3 
left 
Ekman & Dahlbäck Separate Repetitive Black vertical lines Rectilinear to S's 5.72 50 
(1956) on white paper right or left 
Gibson, Smith, Stein- Separate Repetitive Wallpaper with pat- Rectilinear 8.4 122 
on & Johnson tern of dots downward 
Notterman & Page Adjacent Single Spot on oscilloscope Rectilinear hori- 10.0 25.4 
(1957) zon! 
Brandalise & Gotts- Separate Repetitive White dot on edge Circular 5.2 200 
danker (1959) of 2 black discs 


urements were made by Brown (1931) 
and by Brown and Mize (1932) for a 
black square moving upward on 
white paper which the observer saw 
in either of two windows. Ekman 
and Dahlbäck (1956) and Gibson et 
al. (1957) have made measurements 
involving the adjustment of wz for 
apparent equality with w1. The 
former utilized two apertures in each 
of which alternately the observer 
saw the horizontal motion of black 
vertical lines on white paper. The 
latter presented behind two windows 
a downward moving wallpaper with 
a pattern of dots. Most recently, 
Brandalise and Gottsdanker (1959) 
have had subjects adjust the speed of 
rotation of a black disc with a white 
dot on its edge to apparent equality 
with that of another. In these six 
experiments, the measurements of 
Aw were based on comparisons of the 
two speeds which were viewed sep- 
arately in different places. Since the 
equipment involved rotating drums 
or discs, stimulation was repetitive. 


Use of a moving spot on an oscillo= 
scope has facilitated presentation of 
adjacent stimuli. During rectilinear 
motion of a pip at constant speed, an 
incremental change in speed is in- 
troduced. Hick (1950) and Nottet 
man and Page (1957) measured the 
differential threshold in speed for a 
pip as it was horizontally deflected 
across the face of a cathode-ray tube. 
Temporal features of this procedure 
differ from the first. The stimuli 
presented only once and then 
immediate succession. 

The procedure of superimpose? 
stimuli may be illustrated by monoc- 
ular movement parallax. When 0 
objects move at the same linear spe 
perpendicular to the subject’s line 0! 
sight, the difference in their angulat 
speeds provides an indication of theif 
distances from the subject. As thé 
objects are brought closer together: 
the difference in angular speeds 
creases to a thréshold value. Zeg 
(1948) has measured the differentia 
threshold speed for two needles D3 
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TABLE 2 
METHODOLOGY USED IN THE MEASUREMENTS OF Aw 
——————__es—ss___ 
No. of 
Psycho- Total Speed (degrees 
Experimenters very epee No aos nee Ne. of Speed (aortni pares 
Met! i 
S] eeo see ments Minimum Maximum 
Bourdon (1902) Limits Mean 3 1 20 60 0.77 5.04 
Brown (1931) Limits Mean 2 2 10 40 1.79 3.58 
Brown & Mize (1932) Limits Mean 6 2-3 3-6 117 1.72 4.58 
sow Mey 
.6° fie! Constant Standard 4 
stimuli deviation ; y ii cad ats 
15.0° field Constant Standard 6 
stimuli deviation 7 oo me mere 
Hick (1950) Constant Mı 
pone ean 7 18 = 0.15 10.2 
Ekman & Dahlbäck Average Standard 
5 
(1950) ees Seer 10 4 200 2.07 4.81 
Gibson, Smith, Stein- Avera: 
M, , ge Standard 1 
one & Johnson error deviation ce ot Ky D sr 
Notterman & Page Constant Mi 
an 
(1957) Siz e; 7 10 30 2100 0.34 22.7 
Brandalise & Gotts. Ave 
- rage Standard 
danker (1959) Pein Cp E Ñ 5 Fina 


this procedure, which temporally in- 
volves the single presentation simul- 
taneously of wı and we. 

The psychophysical method used 
has been less critical for measure- 
ments of Aw than the spatial rela- 
tionship of the stimuli. Table 2 lists 
significant methodological charac- 
pence for the nine experiments. 
ea specially worth noting are the 
imited range of speeds in most ex- 
periments and the small number of 
measurements in some studies. 


The Differential Threshold as a 
Function of Speed 


Do marked effect of spatial order 
Fi y be observed by inspection of 
3 gure 3, in which Aw is plotted 
ads w. The curves and their 

Ln ints represent the use of adjacent, 
eparate, and superimposed stimuli. 

in 3 ies of Aw is that indicated 
Bisa e2. Since Brown (1931) and 
ae and Mize (1932) made only 
mall number of exploratory meas- 


urements, the points plotted for their 
experiments are the geometric means 
of values they reported for speeds 
1-2, 2-3, 3-4, and 4-5° per second. 
The data plotted for superimposed 
stimuli represent the monocular 
movement parallax thresholds ob- 
tained by Zegers (1948) with the 
widest and narrowest visual fields of 
the four for which he made measure- 
ments. Otherwise, the points repre- 
sent all values reported in the litera- 
ture for Aw as listed in Table 2: 

The solid lines have been drawn 
with unit slope and represent a con- 
stant Weber fraction (Aw/w). In the 
case of adjacent stimuli, solution for 
the intercept constant by the method 
of least squares yields the plotted 


equation: 

log Aw= —0.859-+log % [10] 
It may be observed as a rough ap- 
proximation that the differential 


threshold increases in direct propor- 
tion to the angular speed of a stimu- 
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Fig. 3. The differential angular speed 
threshold (Aw) as a function of the angular 
speed (w) of stimuli which are presented 
spatially adjacent, separate, and superim- 
posed. 


lus. Discrepancies in this approxima- 
tion occur in a middle range of speeds 
(1-5° per second) where the meas- 
ured Aw falls below the empirical 
straight line. At faster speeds, Aw in- 
creases at an increasing rate with w. 
For separate stimuli, the least 
squares equation is as follows: 


log Aw=1.114+log w 


[11] 


Under these conditions, the differen- 
tial threshold increases in direct pro- 
portion to speed from approximately 
1 to 10° per second. The differential 
threshold is greater at slow speeds, 
and less at fast speeds, than the best 
fitting straight line of unit slope. 
Data obtained for superimposed 
stimuli can be described by a con- 
stant Weber fraction only under 
quite restricted conditions. Thus, 
for the widest field (15°) a solid line 
is drawn between the points for the 
two slowest speeds. Its equation is: 


log Aw= —2.893+logw [12 
The rapid increase in the differential 
threshold with speed for superim- 
posed stimuli may be interpreted in 
terms of instability of the retinal 
image and intensity effects in indivi- 
dual cones. 

As Zegers (1948) indicates in dis- 
cussion of his results, high speeds in- 
terfere with good “pickup’’ of the 
stimuli as they appear in the visual 
field and, also, with adequate follow- 
ing movements of the eyes. The in- 
fluence of extent of visual field, so 
marked in Figure 3, was markedly 
decreased, if not eliminated, by pro- 
viding appropriate aids during con- 
trol experiments to fixation and 
stimulus “following.” Careful meas- 
urement of the vertical distance be- 
tween the curves for the 15 and 3.6° 
fields indicates that they could very 
nearly be superimposed by a shift of 
0.905 log unit, the mean of separa- 
tions of 0.853, 0.947, 0.909, and 
0.909 log unit. We may infer that the 
vertical position of the curves de- 
pends primarily on stability of the 
retinal image. When stimulus condi- 
tions for good fixation of the stimulus 
are absent, the differential threshold 
function of Figure 3 is shifted uni- 
formly upward with decrease in ex- 
tent of the visual field. 

The shape of the curves for super- 
imposed stimuli appears to be de- 
pendent upon the intensity effects oc- 
curring in individual cones. Evidence 
for this inference is less direct than 
Zegers’ control experiments involv- 
ing improved conditions for fixation 
and pursuit of the stimulus. How- 
ever, it should be pointed out that 
Graham, Baker, Hecht, and Lloyd 
(1948) measured the differential 
threshold as a function of the lumi- 
nance of the stimulus field. Neutral 
tint filters were placed behind the 
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metal tube through which the ob- 
server saw two needles, one above 
the other, moving at constant and 
equal speeds back and forth across 
an illuminated field. Measurements 
of the precision of distance settings 
with one needle yielded differential 
angular speeds for different lumi- 
nances of the visual field. The de- 
crease in Aw as a function of the in- 
crease in luminance is described by 
Hecht’s intensity discrimination 
equation upon the assumption that 
Aw is a measure of differences in dif- 
fraction luminances and provides a 
AT seen against the general illumina- 
tion, J. 

In addition, measurements of the 
threshold luminance for a moving 
spot of light indicate that the in- 
tensity effect of speed is similar to the 
parallax effect of Figure 3. At mod- 
erate speeds, the threshold luminance 
for discrimination of motion increases 
in direct proportion to the speed 
(Brown, 1958). At faster speeds 
(greater than 10° per second), the 
uminance threshold increases at a 
disproportionate rate until it ap- 
Proximates an asymptote at a limit- 
speed (30 to 40° per second). 
+ a relationship, like that found by 
‘ied may be interpreted in terms 
ol intensity effects occurring in in- 
ividual cones. As angular speed in- 
Aye the duration of passage of the 
Si FA across a given coneis shortened. 
Bats tag intensity effect in each 
ahead unit is lessened, the lumi- 
differe or the moving spot or the 
<i angular speed of the 

must be increased. 


THE WEBER RATIO 


ate ratio provides a con- 
vila Measure by means of which 
i ne discriminations may be com- 
a +s with other sensory discrimina- 

and with performance in track- 


ing and predicting. The ratio of the 
differential threshold (Aw) to the 
magnitude of the standard (w;) may 
be readily calculated from Equations 
10-12 for adjacent, separate, and 
superimposed stimuli. The best es- 
timate of Aw/w for an unspecified w 
is 0.138 for adjacent stimuli and 
0.0769 for separate stimuli. This dif- 
ference has been confirmed by Not- 
terman (1959) in measurements made 
by an oscilloscope with both prece- 
dures. Since his experiment excludes 
variations in stimulus conditions 
other than the spatial order, Notter- 
man’s interpretation of the difference 
is of particular interest: 

Subjects in the adjacent presentation case 
can base their discrimination on a comparison 
of the amount of time taken to traverse the 
initial and final 1} inches on the scope face, 
or—and this is important—they can disregard 
time and look for the jerk which occurs when 
the moving spot instantaneously increases its 
velocity. The subjects employing the sepa- 
rate presentation procedure do not have this 
option: since the standard and comparison 
stimuli are separated in time, there is no jerk. 
In short, the subjects of the (adjacent) pro- 
cedure may have changed the problem from 
one requiring a comparison of two velocities, 
to one requiring a judgment of the presence 
or absence of jerk (p. 3). 


The marked superiority of super- 
imposed stimuli in yielding a low 
Weber fraction is illustrated by the 
value of 0.00128 for two needles 
traversing an extent of 15° at angular 
speeds less than 5° per second. This 
superiority is readily understandable. 
Superimposition of one needle in 
front of the other provides an angular 
offset which Zegers has found to be a 
basic determiner of the differential 
angular speed threshold. The angular 
offset is absent when stimuli are pre- 
sented adjacently in immediate suc- 
cession or separately in space and 
time. » 
Variation of the Weber fraction 
over the whole speed range is plotted 


ADJACENT STIVALI 


SEPARATE STIMU 


serno 


SUPERIMPOSED 


stimu or FELD he 


4 ' 1 
G (DEGREES/SEC) 
Fic. 4. The Weber ratio (Aw/w) as a func- 
tion of the angular speed (w) for discrimina- 
tions utilizing adjacent, separate, and super- 


imposed stimuli, 


in Figure 4. The points represent 
geometric means of values deter- 
mined by different investigators at 
approximately the same angular 
speeds. Thus, the top curve for ad- 
jacent stimuli is the average of values 
obtained by Hick (1950) and Notter- 
man and Page (1957). Except for the 
point at the slowest speed (Hick) 
and at the fastest speed (Notterman 
and Page), each point is the geo- 
metric mean of the Weber fraction 
in both studies. A similar procedure 
has been followed in averaging meas- 
urements made with separate stimuli. 
For superimposed stimuli, the Weber 
fraction has been calculated directly 
from Zegers’ data. In this case, the 
ratio is directly proportional to the 
angular offset existing between the 
reference and comparison stimuli. 
As Zegers has indicated, the value of 
the angular offset (and the Weber 
ratio) increases with speed. 
Examination of the curves of 
Figure 4 suggests a useful empirical 
generalization. The Weber fraction 
for nonsuperimposed stimuli is ap- 
proximately constant in the mid- 
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range of angular speeds. Thus, in the 
range of 0.1 to 20° per second, Aw/w 
shows no greater change than a 
doubling. For adjacent stimuli, the 
maximal ratio is only 2.2 times 
greater than the minimal ratio. For 
separate stimuli, there is a change 
by a factor of 1.9. Although the 
Weber fraction may be fairly con- 
stant in the middle range of stimulus 
values, the rapid rise of the curve for 
superimposed stimuli suggests that 
the ratio may increase markedly at 
extremes. 

The constancy of the Weber ratio 
for differential speed thresholds may 
be interpreted at a descriptive level 
for comparison with other sensory 
discriminations. | Woodworth and 
Schlosberg (1954) have indicated that 
for many sensory discriminations the 
differential threshold is a measure of 
the variability of the effects of stimu- 
lation, i.e., Aw=Ko,. For discrim- 
inations of motion according to 
Brown (1960), the variability in turn 
is proportional to the speed, i.e- 
o.=Cw. It is therefore not surpris- 
ing that Aw/w is constant, at least 
within limits which are not too well 
defined in Figure 4. 

It is of interest to compare the 
magnitude of the ratio with that for 
other discriminations. Under op- 
timal conditions, the minimal Weber 
fraction with superimposed stimuli 
is comparable to that measured for 
pitch discrimination with a standard 
tone and a comparison tone differing 
slightly in frequency. Measurements 
of pitch discrimination indicate that 
Weber's fraction is constant at about 
0.002 beyond 250 cycles per second, 
rising somewhat at the lower fre- 
quencies. The differential speed 
threshold ratio, as measured with 
separate stimuli, is comparable to 
the Weber fraction for lifted weights: 
When measured by lifting weights 
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successively with one hand, the 
Weber ratio is approximately 0.075 
for weights greater than 200 grams. 
We may conclude that the Weber 
ratio for differential speed thresholds 
not only is constant in a medium 
range of stimulus values, but also is of 
the same order of magnitude as that 
found for other discriminations. 


TRACKING BEHAVIOR 


Studies of tracking behavior illus- 
trate an application of the Weber 
ratio for differential speed thresholds. 
This application is of particular in- 
terest since earlier reviews have em- 
phasized the significant motor char- 
acteristics of tracking behavior (Bir- 
Mingham & Taylor, 1954; Fitts, 
1951). Perceptual characteristics 
have been implied by occasional ob- 
Servations that an operator tracks a 
target quickly and efficiently under 
optimal conditions because he es- 
timates its present speed and accel- 
eration and thereby anticipates its 
future motion. During World War 
I, for example, the systematic in- 
vestigation of the manual controls 
for antiaircraft fire control systems 
indicated the anticipatory nature of 
tracking, as discussed by Helson 
(1949), 

Foxboro studies. In the Foxboro 
Studies directed by Helson, error was 
recorded for compensatory tracking 
in which the tracker tries to keep a 
moving pointer aligned as much as 
Possible with a stationary reference 
pointer. Compensatory tracking may 
~© contrasted with pursuit tracking 
in which both pointers move and the 
tracker aligns the following cursor 
under his control with the moving 
target pointer. In the Foxboro studies 
e tracker compensated for the dis- 
Placement of a moving pointer, repre- 
senting the aiming point, from the 
actual position indicated by a sta- 


tionary pointer. Tracking error was 
measured by the time required for 
the target to move from its actual 
position to the aiming point. 

Speed of the handwheel rotation 
was a major variable controlling 
tracking accuracy. For a constant- 
speed unidirectional course, with an 
increase in rate of cranking, the 
tracking error decreased from 55 to 6 
milliseconds when a light handwheel 
of 2.25-inch radius was used (Fox- 
boro Company, 1943a). Since the 
tracking error was consistently of the 
order of milliseconds and could be as 
small as one hundredth of the fastest 
reaction time, it is evident that the 
tracker anticipated the future motion 
of the target and thereby avoided the 
series of oscillations his long reaction 
time would otherwise produce. 

For simple sinusoidal courses, the 
tracker not only anticipated the mo- 
tion of the target but also used an 
averaging motion of the handwheel 
when the course was of too high a 
frequency to follow exactly. As 
course frequency increased, the 
tracker eliminated terminal portions 
of swings. Inertia in the form of a 
heavy handwheel or a flywheel effect 
smoothed the direct tracking of 
courses not requiring high accelera- 
tions and rapid reversals in direction 
(Foxboro Company, 1943b). In addi- 
tion, the averaging type of behavior 
was dependent upon practice and 
familiarity with the course being 
tracked. s 

Contemporary models for tracking 
behavior. Since World War II, the 
concept of feedback mechanisms has 
been generalized to the entire field of 
control and communication theory in 
machines and animals (Wiener, 1948). 
As applied to antiaircraft fire control 
behavior, the concept states that the 
tracker uses the difference between 
the stimulus of a target’s motion and 
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his response as a new input to make 
his motion correspond more closely 
to that of the target. Engineers 
analyzed human tracking per- 
formance in terms of simple servo 
systems with feedback (James, Ni- 
chols, & Phillips, 1947; Raggazini, 
1948; Tustin, 1947). Stimulated by 
the mathematical systems equations 
which emerged from this analysis, 
psychologists have developed their 
own models to describe the behavior 
involved in minimizing the difference 
between two positions with control 
of one (Birmingham & Taylor, 1954; 
Fitts, 1951; Noble, Fitts, & Warren, 
1955). These models make two basic 
assumptions: intermittency of re- 
sponse, and predictiveness of re- 
sponse. 

Intermitiency of tracking responses. 
Despite the smooth and apparently 
continuous appearance of efficient 
tracking, experimental evidence from 
several sources indicates that the 
tracker responds intermittently. 
First, a time record of tracking per- 
formance shows a typical periodicity 
with a predominant frequency of two 
responses per second (Craik, 1947; 
Ellson, Hill, & Gray, 1947). Second, 
analysis of the response patterns to a 
step input displacement of position 
shows that quick corrective move- 
ments occur without visual or kines- 
thetic guidance and that the typical 
time for completing a corrective 
movement, including reaction time, 
is approximately 0.5 second (Cherni- 
koff & Taylor, 1952; Searle & Taylor, 
1948; Taylor & Birmingham, 1948). 
Third, the assumption that the 
tracker responds intermittently at 
0.5-second intervals during continu- 
ous tracking agrees with the optimal 
time constant obtained for conven- 
tional aided tracking (Birmingham & 
Taylor, 1954; Mechler, Russell, & 
Preston, 1949). Fourth, with the as- 


sumption of 0.5-second intermittency 
of corrections, one may predict the 
optimal time constants for more com- 
plex aided-tracking control systems 
involving an acceleration component 
as well as the conventional position 
and rate controls (Searle, 1951). 
Predictiveness of tracking responses. 
The assumption of predictiveness in 
tracking responses is supported by 
the following findings. First, the 
Foxboro studies showed that the 
time error for manual handwheel 
tracking is much less than the reac- 
tion time, as discussed above. Sec- 
ond, pursuit tracking usually yields 
lower error scores than compensa- 
tory tracking (Chernikoff, Birming- 
ham, & Taylor, 1955; Poulton, 1952; 
Senders & Cruzen, 1952). In the pur- 
suit mode of tracking, responses imay 
be made on the basis of a predic. able 
course of the target since its marker 
moves independently of the marker 
with which the tracker follows. In 
the compensatory mode, prediction 
must be limited to the tracking error 
since the tracker attempts to stabilize 
a moving marker representing the 
difference between target motion and 
his own control motion. Third, 
Chernikoff et al. (1955) found that an 
aided-tracking control impairs per- 
formance for the pursuit mode but 
materially improves it for the com- 
pensatory situation. They resolved 
this apparently paradoxical finding 
by considering the nature of aided- 
tracking controls in terms of the pre- 
dictiveness of tracking responses. 
With a position control, the position 
of the moving marker controlled by 
the tracker is directly proportional 
to the position of his control. With 
aided tracking, a movement of the 
control not only causes a propor 
tional change in the position of the 
marker, but also introduces a change 
in its rate of motion. The aided- 
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tracking time constant is yielded by 
the ratio of the control sensitivities. 
With the proper time constant in 
compensatory tracking, the operator 
can correct an error with a control 
motion proportional to the position 
component of the error. He thereby 
sets in changes in rate of motion in 
amounts that are correct on the aver- 
age to match the target motion. 
Use of the aided control in pursuit 
tracking requires that the tracker 
ignore target velocity and not at- 
tempt to predict future position. 
Later experiments by Chernikoff 
and Taylor (1957) have indicated an 
effect of target speed on the optimal 
time constant for both pursuit and 
compensatory tracking. 

Tracking error. With verification 
of the assumptions of’ intermittency 
and predictiveness for tracking per- 
formance, it is evident how differen- 
tial speed thresholds limit the 
tracker’s responses with a position 
control. It may be assumed that ata 
given instant in time the tracker is 
exactly on target but that his cursor 
and the target are moving at different 
speeds. During a short period of 
time, the position error generated is 
approximately the product of this 
Speed difference and the temporal 
interval. Since response intermit- 
tency holds the temporal interval 
Constant, the tracking error is di- 
rectly proportional to the speed dif- 
ference which the tracker can dis- 
criminate. 

Speed of target motion seems to 

ave the same effect on tracking error 
F it has on the differential speed 
erihold as measured with nonsuper- 
EVES stimuli, i.e., tracking error 
> eee as a linear function of 
re 9i Bowen and Chernikoff (1958) 
e investigated the relationship 
oe. magnification, speed of tar- 
Set motion, and tracking error with 


a compensatory position-control sys- 
tem. Both with and without mag- 
nification, measures of tracking per- 
formance did not vary for a constant 
target speed when the frequency and 
amplitude of motion were varied over 
a range useful in tracking research. 
Tracking error increased with an in- 
crease in average speed of target mo- 
tion. Departures from a linear rela- 
tionship were not large. 


PREDICTIVE BEHAVIOR 
Prediction Motion 

Data from Gottsdanker’s series of 
studies of prediction motion demon- 
strate a marked similarity of pursuit 
tracking error to the differential 
speed threshold for adjacent stimuli. 
Similar to the differential threshold 
(approximately 14% of the speed) is 
the average error a tracker makes in 
following a target which moves at a 
constant speed but suddenly dis- 
appears. During the second following 
the disappearance, the tracker main- 
tains the speed with an average error 
of 13, 14, and 16%, as measured in 
three separate studies by Gotts- 
danker (1952a, 1952b, 1955). 

On some trials when the target was 
accelerating or decelerating at the 
moment of disappearance, the tracker 
did not continue the uniform change 
in speed. It should be noted, how- 
ever, that at the moment of disap- 
pearance the change in speed for a 
0.5-second interval was only 5 to 7% 
of the speed and presumably was be- 
low the tracker’s threshold. Gotts- 
danker (1956) has reviewed the ex- 
perimental literature on responses to 
acceleration of target motion. He 
concluded that smoothly accelerated 
motion is generally responded to as if 
the speed were constant, 1-€-, the 
change in speed did not exceed the 
differential speed threshold in the 


studies cited. 
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Gottsdanker (1952a) has measured 
the tracking error not only for dis- 
appearing targets, but also for com- 
pleted courses. The measured error 
is consistent with one calculated upon 
the basis of the assumptions of a 
0.5-second intermittency in response 
anda 14% speed threshold. The aver- 
age error in tracking a target mov- 
ing at a constant speed of 8 millime- 
ters per second was 0.50 millimeters. 
If the tracker were exactly on target 
at a given instant, his error a half- 
second later would be calculated from 
the assumptions as the product of 
0.14X8X0.50 or 0.56 millimeters, 
and the average error during the in- 
terval should be 0.28 millimeter. It 
may be assumed more realistically 
that the tracker was not exactly on 
target at the beginning of the in- 
terval. The average error should be 
calculated as correspondingly greater 
than the minimal value of 0.28 milli- 
meter. 

The prediction of tracking error 
from the Weber ratio for speed dis- 
criminations is not limited to visually 
presented stimuli, but may be ex- 
tended to other stimuli. Gottsdanker 
(1954) has measured the precision of 
tapping at a constant rate of two per 
second. He found that subjects 
could maintain this rate to an ac- 
curacy of 2.4% when the stimulus of 
pops from a magnetic tape playback 
was removed. In the Foxboro studies 
it was found that the tracker could 
utilize the increased precision of 
rapid repetitive movements in fast 
handwheel cranking over the inter- 
mittent corrective responses of slower 
handwheel turning. As an approxi- 
mation, the tracking error should be 
limited by the product of the repeti- 
tion rate threshold and the time for 
each repetitive movement at the 
faster speeds. For example, the time 
error should be the product of 0.024 
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and 0.25 second per repetitive move- 
ment for cranking at 120 rpm. This 
value coincides exactly with the 
measured time error of 6 milliseconds 
for the light handwheel with short 
radius. 


Prediction of Future Positions 
of a Moving Target 


Although the differential speed 
threshold would seem to be clearly 
related to predictions of future posi- 
tion of a moving object, data on the 
nature of the relationship are lim- 
ited. Slater-Hammel (1955) has 
had subjects observe a marker mov- 
ing at a uniform speed over different 
display distances and then had them 
estimate when the marker would 
complete traversing different target 
distances. The display distance did 
not affect the error in time of estimat- 
ing the arrival of the uniformly mov- 
ing marker at a specified point in 
space. However, the error increased 
systematically with an increase in 
the target distance which the marker 
traversed after disappearing. In 
terms of percentage of the required 
time, the error varied between 8.9% 
and 21.6%. These values agree with 
expectations based on the Weber 
ratio for speed discriminations with 
nonsuperimposed stimuli (cf. Figure 
4). 

Morin, Grant, and Nystrom (1956) 
have reported similar results despite 
two important differences in their 
experimental procedure. First, 1n- 
stead of Slater-Hammel’s stimulus 
which moved continuously at a con- 
stant speed, Morin et al. used the 
successive illumination of cue lights 
which were placed at even intervals 
in a horizontal row. After illumina- 
tion of the last cue light, the subject 
estimated the time it would take the 
imaginary moving object to reach a 
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target light. Second, the object 
traveled at a rather slow computed 
speed of either 0.179 or 0.358° per 
second rather than the speed of ap- 
proximately 5° per second used by 
Slater-Hammel. Results obtained by 
Morin et al. confirmed the fact that 
the error of estimating arrival in- 
creases with target distance. Signif- 
icantly, they also found for their 
faster speed that the mean errors of 
estimation were generally less than 
10% of the computed time. When 
the speed was 0.179° per second, the 
mean errors of estimation ranged 
from 25 to 53%. These values sug- 
gest an apparent extrapolation to 
slow speeds of the data presented in 
Figure 4. ; 

Garvey, Knowles, and Newlin 
(1956) have measured the accuracy 
of prediction in terms of deviations 
in range and bearing between esti- 
mated and actual position plots on 
four different radar displays. They 
found that accuracy of estimated 
Position was a function of target 
speed, i.e., the faster the motion of 
the target the less accurate the es- 
timate. This relationship resembles 
that between Aw and w of Figure 3. 

Gottsdanker and Edwards (1957) 
have studied a more complex type of 
Prediction situation. Two targets 
moved down perpendicular paths 
towards an intersection but disap- 
Peared before reaching it. The sub- 
Ject estimated where one target 
would be when the other crossed the 
intersection. Gottsdanker concluded 
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that for both accelerated and con- 
stant-speed targets the prediction 
was based on relative positions at 
time of the target's disappearance 
rather than on relative speeds or 
accelerations. 


SUMMARY 


Measurements of the differential 
speed threshold (Aw) have been 
plotted against speed (w) for com- 
parison stimuli which were presented 
adjacent, separate, or superimposed. 
As a rough approximation, the 
threshold increases in direct propor- 
tion to speed for nonsuperimposed 
stimuli over a range from 0.1 to 20° 
per second (Aw= Kw). Although the 
relationship for superimposed stimuli 
(monocular parallax) is similar, in- 
adequate ocular following movements 
and receptor intensity effects modify 
the relationship at fast speeds 
(greater than 5° per second). Esti- 
mates of the Weber ratio (Aw/w) of 
0.138 for adjacent stimuli and of 
0.0768 for separate stimuli provide a 
basis for interpretation of tracking 
and other predictive behavior. Ex- 
periments support the assumptions of 
intermittency and predictiveness of 
responses in tracking. With these 
assumptions, error in performance 
may be calculated for relatively 
simple tasks from the Weber ratio. 
For more complex tasks, constancy 
of the Weber ratio agrees with the 
linear relationship found between 
tracking error and speed of target 


motion. 
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SELF-ACCEPTANCE AND SELF-EVALUATIVE BEHAVIOR: 
A CRITIQUE OF METHODOLOGY 
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Ohio State University 


“Self-acceptance’’ has become a 
popular concept in psychological 
literature. Along with “rigidity,” 
“authoritarianism,” and ‘conform- 
ity,” it has come to particular promi- 
nence in the last decade, perhaps re- 
flecting an evolution in value systems 
in American culture. Concepts per- 
taining to the self have been given 
considerable space in the writings of 
personality theorists and social-per- 
sonality psychologists and inevitably 
have found their way into psycho- 
logical research. 

Self-acceptance has been particu- 
larly identified with Rogers’ person- 
ality theory and is accorded the 
status in that system of a major 
therapeutic goal. Phenomenological 
research on self-acceptance dates 
from the classic study of Raimy 
(1948). However, very similar con- 
cepts have played dominant roles in 
other theories—e.g., Snygg and 
Combs (1949), Horney (1950), and 
Sullivan (1953). More important, 
self-acceptance seems to have been 
pre-empted for less systematic, ec- 
lectic usage by a great many practic- 
ing clinicians and researchers (Cowen, 
1956; Cowen, Heilizer, Axelrod, & 
Alexander, 1957; Zuckerman, Baer, 
& Monashkin, 1956; Zuckerman & 
Monashkin, 1957). The major por- 
tion of the research onfself-acceptance 
derives from Rogers’ self-theory, but 


1 The authors would like to express their in- 
debtedness to the following persons, who 
critically read this paper and made a number 
of valuable suggestions: Donald Campbell, 
Shephard Liverant, Julian Rotter, Lee Sech- 
rest, Charles Smock, and Janet Taylor. 
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studies based on other theories (Block 
& Thomas, 1955; Sarbin & Rosen- 
berg, 1955) and the generally empiri- 
cal investigations referred to above 
attest to the breadth of current 
interest in the behaviors subsumed 
under this broadly interpreted con- 
struct. 

While no single definition of self- 
acceptance would be accepted by all 
who use the term, the phenomeno- 
logical view of Rogers seems to rep- 
resent at least a common point of 
departure. From the definition of a 
self-concept construct the concept of 
self-acceptance is derived, referring, 
at least operationally, to the extent 
to which this self-concept is con- 
gruent with the individual’s de- 
scription of his ‘‘ideal self.” 

The majority of self-acceptance 
tests have followed this model (see 
Table 1). A somewhat different 
psychometric model has been pro- 
posed by Gough (1955), in which self- 
acceptance is inferred from the ratio 
of “favorable” self-descriptive state- 
ments to the total number of self- 
descriptive statements made by the 
subject. 

A common denominator in the 
definition of self-acceptance, judging 
from the operations employed in its 
assessment, would seem to be the 
degree of self-satisfaction in self- 
evaluation. This definitional con- 
sensus, however, is achieved at the 
level of operations, and other mean- 
ings may be implied by self-accept- 
ance constructs. _Phenomenologic 
theorists, for example, appear to be 
interested in an “internal” phe- 
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TABLE 1 


CLASSIFICATION OF SOME 


Tests OF SELF-ACCEPTANCE 


Name of Test Type 


Score Obtained 


SIO (self-ideal-other) 
Qsort (Rogers & Dy- 
mond, 1954) 


Q sort 


Index of Adjustment 
and Values (Bills, 
1958; Bills, Vance, & 
McLean, 1951) 


Adjective rating scale 


Adjective Check-List 
(Gough, 1955) 


Adjective check list 


Buss scale (Buss & 
Gerjuoy, 1957; Zuck- 
erman & Monashkin, 
1957) 


Adjective check list 


Self-Rating Inventory 


Self-rating scale 
(Brownfain, 1952) 


Attitudes toward Self 
and Others Question- 
naire (Phillips, 1951) 


Self-rating scale 


Berger Self-Acceptance Self-rating scale 
scale (Berger, 1952) 


Interpersonal Check Adjective check list 
List (LaForge & Suc- 
zek, 1955) 


Pearson correlation between sorts of self and ideal 
on 100 items. Also, “adjustment score” based on 
number of favorable statements placed on “like 
me” end of distribution and number of unfavora- 
ble statements placed on “unlike me” end. 


Self-acceptance score=sum of self-concept rat- 
ings (1-5 scale) on 49 traits. Also, a self-ideal dis- 
crepancy score is calculated. Norms available. 


Self-acceptance score=number of favorable ad- 
jectives checked divided by total number of ad- 
jectives checked. 


Sum of differences without regard to sign of scale 
values (based on psychologists’ ratings) of adjec- 
tives checked on self and ideal descriptions. 


“Positive self-concept” and “negative self-con- 
cept” scores. Self-acceptance=sum of positive 
self-concept description weights minus negative 
self-concept description weights, disregarding 
sign. 


Sum of weights (1-5) on each item. Norms avail- 
able. 


Sum of item weights (1-5). 


s for each adjective (1-4). 


Intensity scale value 
ween self and 


Self-acceptance = discrepancy bet 
ideal ratings. 


Nomenal state. Other theorists 
(Block & Thomas, 1955) have formu- 
ated self-acceptance as a function of 
an ego-control construct. The phe- 
ironical concept of Rogers and 
the psychoanalytic set of meanings 
implied by Block and Thomas’ con- 
struct of ego control probably diverge 
fot respects. The purpose 
ae however, is merely to illustrate 
et ak that emphasis on defini- 
ti al clarity achieved at an opera- 
ional level tends to ignore the 


probably significant differences in the 
implied theoretical meanings of self- 
acceptance. 

Reflecting in part the widespread 
interest in self-acceptance are the 
numerous instruments which have 
been devised to measure the con- 
struct. A striking phenomenon of 
research in this area is that these 
tests, characterized by a diversity of 
both theoretical and psychometric 
models, have apparently been as- 
sumed to be interchangeable. Thus, 
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characteristic of self-acceptance re- 
search appears to be a basic concep- 
tion that measures of this construct 
possess face validity: that is, in a 
simple denotative sense, the tests 
are viewed as being manifestly similar 
(Peak, 1953). 

Criterion validation of self-accept- 
ance tests is, of course, logically im- 
possible, and attempts at construct 
validation do not lend much faith in 
the validity even of a particular test, 
much less of all the different tests. 
Face validity, however, has appar- 
ently been assumed without question. 
The acceptance of face validity— 
that is, manifest similarity—implies 
adherence to a further assumption 
incorporated in phenomenological 
theory—that of the validity of self- 
reports (Rogers, 1951, p. 494). In 
terms of these assumptions, a self- 
acceptance test is valid if it looks like 
a self-acceptance test and is similar 
to other tests, and what a person says 
about himself self-evaluatively is ac- 
cepted as a valid indication of how 
he “really” feels about himself. 

The acceptance of these assump- 
tions, whether acknowledged or im- 
plicit, has definite implications for 
the assessment of self-acceptance and 
for the interpretation of experimental 
results in this area. This paper will 
show that there are four major prob- 
lems in the measurement of this con- 
struct and that, in view of the com- 
mon adherence to these assumptions, 
the results of studies on self-accept- 
ance are rendered highly ambiguous. 
These issues seem, despite their 
essential pertinence to research on 
self-acceptance, to have been suffi- 
ciently ignored to warrant exposition 
in this paper. It will be seen that 
these issues are not limited solely to 
self-acceptance, but represent in- 
stead basic logical and psychometric 
considerations which may serve to 
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illustrate problems in personality re- 
search in general. 


EQUIVALENCE OF OPERATIONS 


As observed above, the diverse 
tests of self-acceptance have been 
assumed to be equivalent operations 
for measuring behaviors subsumed 
under the construct. The failure of 
experimenters to consider the problem 
of the equivalence of assessment 
operations in published reports (Bills, 
Vance, & McLean, 1951; Block & 
Thomas, 1955; Calvin & Holtzman, 
1953; Cowen, Heilizer, & Axelrod, 
1955; Hillson & Worchel, 1957; 
Phillips, 1951) raises the question of 
the basis on which the findings of 
individual studies employing differ- 
ent measuring operations are gen- 
eralized and incorporated in the 
larger body of self-acceptance re- 
search. The basis of generalization, 
in view of the absence of explicit con- 
sideration of the question, must be 
inferred to lie in the assumption of 
face validity as defined above. Even 
statements implying differences 
among self-acceptance tests fail to 
deal with the logically sequential 
question of the extent to which these 
differences may mean that self-ac- 
ceptance as measured by Test 1 is not 
the same as self-acceptance as meas- 
ured by Test 2. The following excerpt 
illustrates this point (Cowen et al, 
1955): 

Presumably each of these classes of [self- 
acceptance] measures has certain peculiar 
advantages and limitations. . . . In any casey 
a good many data have now been presented 
demonstrating some empirical validity for 
both types of measures (i.e., they can discrim- 
inate among subjects with respect to other 
personality and behavioral indices in a man- 
ner roughly consistent with predictive ex 


pectations based on phenomenological theory) 
(p. 242). 


These writers do not make clear 
what relationship obtains betwee? 
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the classes of self-acceptance tests 
(tests yielding discrepancy scores 
versus self-concept rating devices) 
or, more basically, how phenomeno- 
logical personality theory can lead to 
operations that apparently can satisfy 
certain predictions in the case of one 
class of instruments but requires 
different operations to obtain posi- 
tive results from other hypotheses 
based on the same construct. 

According to the notion of face 
validity, what looks like a test of self- 
acceptance is such, by definition. All 
the test constructor is required to do, 
in terms of this criterion, is to elicit 
self-evaluative statements from sub- 
jects. All measures that conform to 
this requirement achieve validity and 
are therefore equivalent. By this 
procedure the test itself becomes the 
construct, in the sense of the narrow- 
est kind of operational definition. 

An operational definition stating 
what is measured by a given device 
or procedure in terms of specified 
Measurement operations is, of course, 
a perfectly legitimate and necessary 
Procedure in scientific investigation 
as long as the interpretation of results 
ts strictly confined to the particular test 
or measurement procedure. A problem 
arises, however, when an attempt is 
made to generalize from experimental 
findings with a particular test to re- 
sults obtained by different assessment 
Operations. The problem similarly 
Occurs in another case when a certain 
test is applied to an experimental 
Problem and negative results are 
interpreted as disconfirming the hy- 
Potheses relating the construct to 
observables, As Jessor and Ham- 
eat (1957) have pointed out, in the 
rend of an explicit, logical rela- 
ER A between the superordinate 
ee and the operations de- 
ea o assess it, conclusions can- 

e made concerning the validity 
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of the hypotheses since invalid meas- 
urement operations could equally 
account for negative findings. 

The point at issue is that tests of 
self-acceptance (or, for that matter, 
of any construct) which are based on 
different construct systems and in 
the development of which different 
procedures and items have been em- 
ployed are not equivalent im the ab- 
sence of empirical demonstration of 
their relationships; they must be 
shown to be either highly related to 
each other or similarly related to 
other constructs in the nomological 
net. Further, in the absence of dem- 
onstrated equivalence, experimental 
results cannot be generalized to find- 
ings with a different instrument. 
This seems to be so obvious a con- 
sideration that explication here is 
redundant. The fact remains, how- 
ever, that the equivalence of self- 
acceptance tests has been assumed 
despite their independent derivation 
and despite the relative lack of em- 
pirical demonstration that there is a 
high degree of common variance 
among them. 

In respect to the latter point, three 
studies are of interest. Bills (1958) 
reports a correlation of .24 between 
the self-concept score on the Index of 
Adjustment and Values (IAV) and 
the “‘self-score” of the Phillips Atti- 
tudes Toward Self and Others Ques- 
tionnaire (1951). A correlation of .56 
is reported between the Bills self- 
ideal discrepancy score and the 
Phillips self-score. Omwake (1954) 
found a correlation of .55 between 
the IAV self-acceptance (self-ideal 
discrepancy) score and the self-score 
on the Phillips questionnaire and a 
correlation of .49 between the self- 
acceptance score on the IAV and the 
Berger self-acceptance scale (Berger, 
1952). In a recent study, Cowen 
(1956) found that two self-acceptance 
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measures yielding self-ideal discrep- 
ancy scores (Bills IAV and the 
Brownfain Self-Rating Inventory) 
were uncorrelated. The magnitude 
of these correlations indicates that 
the prediction of scores on one of these 
measures from scores on another 
would be accompanied by a wide 
margin of error. 

The diversity of item selection 
procedures, item content, type of 
response elicited, and test format 
which is characteristic of test con- 
struction in this area suggests that 
what is operationally defined as self- 
acceptance on one test may be quite 
different from the sample of self- 
evaluative behavior elicited in an- 
other psychometric situation. Fur- 
ther, self-acceptance is construed 
differently by different theorists (cf. 
Block & Thomas, 1955; Butler & 
Haigh, 1954; LaForge & Suczek, 1955; 
Sarbin & Rosenberg, 1955), and 
these definitional differences are un- 
doubtedly reflected in self-acceptance 
tests. 

Even if one grants the assumption 
of face validity with its clearly im- 
plied meaning of equivalence as made 
by the experimenter, to assume that 
subjects will perceive these psycho- 
metric situations in the same way is 
another matter. It is quite conceiv- 
able that subjects may categorize the 
self-evaluative situations represented 
by the various tests of self-acceptance 
quite differently, with the result that 
scores obtained on these measures 
will not be congruent. According to 
this argument, a subject’s expectan- 
cies that his goals will be achieved or 
frustrated as a result of his sorting a 
number of statements on a forced- 
choice distribution from “like me” to 
“unlike me” (Butler & Haigh, 1954) 
may be quite different from the ex- 
pectancies aroused by a situation in 
which he is asked to attribute certain 
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adjectival characteristics to himself 
(Gough, 1955). Ironically, a phe- 
nomenological definition of a self- 
report variable is particularly obli- 
gated to account for differences in 
the subject’s perception of the meas- 
urement device. In any case, unless 
it can be shown that there is a high 
degree of congruence of the various 
measures within the experimental 
populations sampled, one is without 
means of measuring self-acceptance 
as phenomenologically defined. The 
individual’s private, unique experi- 
ence of self-satisfaction or dissatis- 
faction remains, indeed, private. 

It seems highly probable that dif- 
ferences among self-acceptance tests 
plus the likelihood that subjects will 
categorize these tests differently may 
result in the sampling of relatively 
nonoverlapping behaviors by the 
various tests. To be recognized is the 
fact that this is an empirical problem 
for which, to the writers’ knowledge, 
the three studies cited above provide 
the only suggestive evidence.’ The 
recently proposed model (Campbell 
& Fiske, 1959) for assessing conver- 
gent and discriminant validity would 
seem to be highly appropriate for 
determining the tenability of the as- 
sumption of equivalence of opera- 
tions for measuring self-acceptance. 


DEFINITION OF THE CONSTRUCT 
Specifying Parameters 


The ability to reach generalized 
conclusions from current self-accept- 
ance research seems to be limited by & 
failure to give adequate definitions 
to the construct itself. As Rotter 


2 Since the completion of this article, further 
research has been published which bears di- 
rectly on the problem of the equivalence 0 
self-acceptance tests and suggests that a 50° 
cially desirable response set may constitute 4 
major source of variance (Crowne, Stephens, 
& Kelly, 1961). 
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(1954) has pointed out, it is impor- 
tant to distinguish between ideal, 
theoretical, and operational defini- 
tions of a given construct. An experi- 
menter can define self-acceptance, 
for example, as the behavior sample 
(or as the “internal” phenomenal 
state reflected by the behavior sample) 
obtained on a particular test. But he 
is usually not interested in restricting 
his interpretation of his findings (if 
any) to this limited behavior sample, 
and he seeks to place his results in 
the larger context of research by 
other investigators and to generalize 
his findings to “real life” situations 
such as those encountered in clinical 
practice. By a narrow interpretation 
of operationism, the experimenter has 
made it logically indefensible to re- 
late his findings to a theoretical sys- 
tem, to results obtained with other 
measurement devices, or to “real 
life” situations. When nothing more 
than an operational definition is 
offered, the parameters defining the 
variable are not specifiable, and there 
is no basis for generalization of the 
results. 
At the other extreme, definitions 
a self-acceptance at an abstract 
evel, not specifically articulated with 
oe variables in a theory or tied 
oe test, are apt to be seman- 
eet oose and to be subject to 
se oe interpretations. It is true, 
z eae that definitions of variables 
an (oe transcend any particular 
AA ERRA and can usually be 
Goa: ea finite variety of situa- 
RN ehaviors, The looseness of 
ET mron however, precludes 
DS ests of hypotheses and 
sible il communication impos- 
thers tee oe research 
Pat ail een few if any definitions 
Gs Tuct which are not either 
The rton or highly abstract. 
eduction from an abstract 
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definition, with all its surplus mean- 
ings, to specific operations is likely to 
be a tenuous one and, perhaps more 
often than not, is a private, nonre- 
peatable process. An intervening 
step is necessary in which the con- 
struct is broadly defined in terms of 
specific behavioral referents and pref- 
erably in relation to other variables 
in a specific theory. A “working 
definition,” as Rotter has defined it, 
clearly represents an attempt to 
specify the parameters of the variable 
in question so that both generality 
and precisecommunicationare gained. 
Self-acceptance research appears to 
have lacked such definitions. 
Although this paper is chiefly con- 
cerned with pointing out certain 
methodological pitfalls in research on 
self-acceptance, some clarification 
may be achieved by defining briefly 
this intermediate theoretical step and 
attempting to relate the logic of con- 
struct validation to the more general 
theoretical problem. Rotter's work- 
ing definition could be described as a 
definition at the construct level. In 
terms of this view, the behavioral 
referents and the hypothesized relation- 
ships of the construct are described 
as part of its definition—that is, the 
implied meanings of the term are 
publicly specified. In effect, specify- 
ing the behavioral referents and 
hypothesized relationships reduce to 
the same thing: locating the construct 
in a nomological net. In the language 
of test construction, Cronbach and 


Meehl (1955) write: 


Construct validation takes place when an 
investigator believes that his instrument re- 
flects a particular construct to which are at- 
tached certain meanings. The proposed inter- 
pretation generates specific testable hypothe- 
ses, which are a means of confirming or dis- 
confirming the claim....To validate a 
claim that a test measures a construct @ 
nomological net surrounding the concept must 
exist [italics added] (pp. 290-291). 
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The logic of construct validation can- 
not be invoked to justify the identi- 
fication of a particular set of opera- 
tions as unique to a given construct, 
nor does it support the view that a 
construct is “validated” by the con- 
firmation of a single hypothesis. The 
establishment of a single relationship 
belongs more properly in the domain 
of criterion oriented validity, as Cron- 
bach and Meehl point out. With con- 
struct validation procedures clearly 
at issue, it would seem to be desirable 
to specify in advance the referents of 
self-acceptance. When the situations 
in which the behaviors subsumed 
under the construct and the behaviors 
themselves are identified, some idea 
of the generality and functional unity 
of self-acceptance is afforded, and 
relationships to other constructs, 
situations, and measurement opera- 
tions can be suggested at a logical 
level. 

Underwood (1957) has described 
the difficulty in moving from theo- 
retical definitions (or constructs) to 
operational definitions—a difficulty 
that appears to be characteristic of 
psychological research. Campbell 
and Fiske (1959) have extended 
Underwood's point to show that the 
transition from operations to con- 
struct can involve perplexities equally 
difficult. The essence of the latter 
problem is that a single set of opera- 
tions is capable of multiple interpre- 
tations; convergence on a single inter- 
pretation (that is, establishing that a 
relationship holds in a particular 
nomological net and cannot be more 
adequately accounted for in another 
net) is achieved by a process of tri- 
angulation from a number of different 
operations. Convergent validation, 
however, involves complex designs 
and extensive preliminary research 
efforts. Further, convergent valida- 
tion does not necessarily help to make 
more explicit the descent from a theo- 
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retical model to measurement opera- 
tions. According to the present view 
of definition at the construct level, 
this explicitness would be achieved 
and the reverse problem, that of 
interpreting results from a set of 
operations, might be at least partially 
solved. That is, alternative explana- 
tions of experimental findings could 
be examined in the light of the hy- 
pothesized relationships proposed in 
different construct systems claiming 
to explain the same body of data, 
with the result that incomplete or 
inconsistent interpretations might 
be discarded in favor of interpreta- 
tions whose ‘“‘fit” to the data is more 
adequate. 

For example, phenomenological 
theory implicitly hypothesizes a lin- 
ear relationship between self-accept- 
ance and adjustment (Butler & 
Haigh, 1954), while acknowledging 
the possibility that very high re- 
ported self-acceptance may indicate 
“defensive” unwillingness to reveal 
personal dissatisfaction. Block and 
Thomas (1955), however, have shown 
that a curvilinear model, in which 
both very high and very low self- 
acceptance are associated with mal- 
adjustment, affords a better explana- 
tion of the phenomenon of defensive- 
ness. It is conceivable that more ex- 
plicit formulation of the phenomeno- 
logical self-acceptance construct and 
its derived test procedures might 
have provided a more adequate ex- 
planation of defensive responding 1" 
the Butler and Haigh study. More 
precise definition of the variable in 
question might thus have directed @ 
search for operations less susceptible 
to systematic response bias. 

In a recent paper, Cowen and 
Tongas (1959) have reviewed a num- 
ber of construct validation studies 0” 
the IAV (Bills, 1958). They point tO 
the fact that several of these studies 
have reported significant results 1” 


the direction opposite to theoretical 
expectation. In one study, on 10 of 21 
hypotheses specifying differences be- 
tween high and low self-acceptance 
scorers, many differences were found 
which indicated that subjects with 
high self-acceptance scores were more 
maladjusted than low scorers (Bills, 
1953a). As Cowen and Tongas ob- 
serve, high self-acceptance should 
theoretically be associated with satis- 
factory adjustment, not maladjust- 
ment. Another theoretical incon- 
sistency occurred in the failure to 
show that lowered self-concept rat- 
ings and longer response times in 
Word association are associated with 
conflict and emotionality. The re- 
Sults of this study were again, in fact, 
ficant in the opposite direction 
is, 1953b). 33ills interpreted these 
ngs as inéicating a decrease in 
msiveness. Cowen and Tongas 
, however, that: 


s procedures can be specified before 
fact, by which we can discriminate the 
SC (self-concept) score representing good 
adjustment from the high SC score represent- 
De ensiveness, we are operating within a 
System in which the results of a given 
are t, irrespective of their direction, 
; interpreted as confirming the under- 
lying theory (pp. 362-363). 
ty 
 Self-acceptance research is in need 
of clear construct-level definitions in 
hich the relationships of the con- 
um to other variables are explicitly 
ated. These definitions must refer 
primarily to the relationship of self- 
acceptance to other variables in the 
ap theory in which the construct 
mbedded. Depending upon the 
oat theory, definitions might 
ahs the nature of the relationship 
4 -acceptance to adjustment; to 
ic fee ity variables as creativ- 
CRANN uroticism, and defensiveness; 
igy nterpersonal variables such as 
eceptance of others; to environ- 


m if 
nental, social, and cultural variables, 
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as, for example, the role of cultural 
sanctions in self-evaluation, or the 
influence of the experimental (or 
therapeutic) context on self-appraisal. 


Representative Sampling 


A second problem associated with 
the definition of the parameters of 
self-acceptance concerns the rep- 
resentative sampling of self-accept- 
ance test items. As applied to the 
construct of self-acceptance, the prob- 
lem of representative sampling is 
involved in the systematic sampling 
of some specified universe of self- 
evaluative behaviors. Assuming that 
one has defined this population theo- 
retically, it is then of importance to 
draw one’s sample of test items in 
such a way as to represent their oc- 
currence in the population. The 
achievement of representative sam- 
pling in this respect means that gen- 
eralization can reasonably be at- 
tempted to other situations and/or 
behaviors than those of a particular 
experiment or test. Although the 
behavioral referents of self-accept- 
ance might seem obvious, on closer 
scrutiny it appears that there is 
notable confusion resulting from a 
lack of consensus as to what these 
referents are. 

Some examples from published 
research may illustrate what is im- 
plied by failure to sample representa- 
tively a population of self-evaluative 
behaviors. Butler and Haigh (1954) 
begin with Rogers’ abstract defini- 
tion of the self-concept. Then, they 
write: 

A set of one hundred [self-reference] state- 
ments was taken at random from available 
therapeutic protocols. (Actually, the state- 
ments were selected on the basis of accidental, 
rather than random, sampling) (p. 57). 


The population of relevant self- 
percepts was therefore restricted to 
those verbalized by some sample of 
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clients in client centered therapy, the 
basis for sampling was accidental, 
and thus there is no precise definition 
of self-acceptance in terms of what 
particular self-percepts define its 
parameters. The finding that changes 
in self-acceptance were demonstrated 
to occur as a function of client cen- 
tered therapy is thereby limited to 
the particular conditions of this ex- 
periment, the subject population 
used, and the particular items em- 
ployed in the Q sort measure. For 
example, it is quite possible (but un- 
known) that the statements used 
comprise a sample biased in favor of 
client centered counseling as per- 
ceived and defined by the judges 
(presumably Butler & Haigh) who 
selected the items. 

A second example can be seen in 
the development of the IAV (Bills 
et al., 1951). The items (adjectives) 
in the IAV were drawn from Allport 
and Odbert’s (1936) list of 17,953 
traits. The basis of selection was the 
frequent appearance of the adjective 
in question in client centered inter- 
views and whether it presented a 
“clear example of self-concept defini- 
tion.” Self-evaluation on the IAV, 
then, pertains only to the Allport and 
Odbert traits mentioned frequently 
in client centered interviews, and 
generalization to other self-evalua- 
tive situations, or traits, would be 
tenuous. 

Gough’s (1955) Adjective Check 
List (ACL) affords a third illustra- 
tive example. The ACL consists of 
300 adjectives selected from Cattell’s 
(1943, 1946) consolidation and fac- 
torization of the Allport and Odbert 
trait list. The basis on which the 300 
adjectives in the ACL were derived 
from Cattell’s list of 171 trait vari- 
ables is not specified. In addition, 
the assumptions both of Allport and 
Odbert in their original derivation of 
the trait list and of Cattell in his 
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factorization are further restrictions 
in interpreting ACL scores. 

With such lists of traits or items 
it is necessary to assume either that 
they truly represent all self-percepts, 
or at least that they represent the 
most important ones. But, especially 
for the phenomenologist, must it not 
be assumed that these are different 
for different subjects and/or subject 
populations? Must not this list, then, 
be tailor-made to the subject to be 
truly representative for him (a totally 
idiographic procedure)? Perhaps 
what is required is that the subject 
generate his own list of self-descrip- 
tions, or a self-description, and the 
values he attaches to the separate 
elements and to the composite. 
Kelly’s (1955) Role-Construct Reper- 
tory Test appears to fit this model. 

It would seem possible to achieve 
some degree of representativesess in 
the sampling of a defined universe of 
self-reference items. The definition 
of the population is properly referable 
to the theory in which the self-ac- 
ceptance construct is embedded. 
That is, one should be able to deduce 
from the theory the nature of the 
items to be sampled (although, from 
a phenomenological theory, one might 
protest that this population of items 
is unique to the individual; but this 
only thickens the soup). Not only 
should the population of subjects be 
specifiable (for example, the theory 
has particular relevance to persons 10 
client centered therapy), but what 
constitutes a relevant self-evaluative 
statement (that is, the basis for self- 
evaluation) should be deducible as 
well. The relative adequacy of theo- 
ries employing self-acceptance con- 
structs is clearly at issue in this case. 

With regard to the problem of 
sampling a defined universe, ONE 
approach has been suggested by 
Crowne (1959). Definitions of self- 
accepting and self-derogatory be 
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havior from the point of view of 
social learning theory (Rotter, 1954) 
were given first to psychologist 
judges and then to judges drawn from 
the subject population (introductory 
psychology students) to which gen- 
eralization was intended. The psy- 
chologists were asked to generate 
from these definitions lists of self- 
evaluative behaviors—that is, be- 
havioral referents, or cues, of self- 
acceptance and self-rejection—com- 
mon in such a subject population. 
Subject judges were given a list of 
300 adjectives (actually, the ACL) 
and asked to rate each adjective in 
terms of the extent to which they 
felt that, if it were checked by one of 
their peers as descriptive of himself, 
self-acceptance or self-rejection would 
be indicated. Items were then se- 
lected on the basis of high interjudge 
agreement of both psychologist and 
Subject judges. In this way the 
items were tied to, and representa- 
tive of, both the superordinate theory 
and the specific population of self- 
evaluative behaviors common to the 
€xperimental population. This pro- 
cedure was still limited, however, to 
the extent that the list of 300 adjec- 
tives failed to represent some clearly 
defined universe. Generalizing the 
es used in this study, it would 
€ possible to elicit descriptions of 
selt-acceptance and self-rejection (the 
lefinitions for the judges being de- 
ve from theory) from a large 
T of judges drawn from the 
oe population. Items might 
ie be selected from descriptive 
units on which there was high inter- 
pees agreement. The methodologi- 
x and psychometric considerations 
(1 on 4 by Jessor and Hammond 
He would presumably dictate 
. rm of the scale, type of re- 
Ponse, and related aspects of test 
Construction. 


Thi ; 
his section has been concerned 
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with two problems related to the 
definition of the parameters of self- 
acceptance: (a) the necessity of pro- 
viding a definition at the construct 
level, in which the behavioral refer- 
ents of self-acceptance are specified 
and the construct located in a nomo- 
logical net; and (b) the need to con- 
sider the representativeness of the 
sampling of a population, as defined 
in a of self-reference statements or 
items. Failure to meet these criteria 
results in the inability of the experi- 
menter or test constructor legiti- 
mately to generalize from the par- 
ticular conditions (subjects and stim- 
uli) of his experiment or test. 


SocIaL DESIRABILITY 


The third general issue to raise 
concerns the extent to which self- 
evaluative responses are influenced 
by “defensive behavior” (Butler & 
Haigh, 1954; Zuckerman & Monash- 
kin, 1957), ‘‘self-protective response 
tendencies” (Crowne, 1959), or tso- 
cial desirability” (Edwards, 1957; 
Kenny, 1956). It is important, how- 
ever, first to consider whether these 
terms refer to the same or different 
phenomena. 

Butler and Haigh apply the term 
“defensive responding” to the re- 
sponses of those individuals who do 
not reveal the extent of their self- 
dissatisfaction and who, by other 
criteria, would be judged as malad- 
justed. (These authors thus seem to 
reject, for some subjects at least, the 
assumption of validity of self-reports, 
although how this can be done within 
a phenomenological frame of refer- 
ence is hard to understand.) ‘‘De- 
fensiveness” has been used by 
Zuckerman and Monashkin to refer 
to the phenomenon whereby “The 
person who is self-satisfied is likely 
to answer MMPI items in a way 
which he considers personally and 
socially desirable” (p. 147). Crowne 


114 


used the term “‘self-protective be- 
havior” to refer to the unwillingness 
of some individuals to acknowl- 
edge self-dissatisfaction. These three 
terms, then, have been used to refer 
to highly similar phenomena. 
“Social desirability” as defined by 
Edwards (1957), however, refers pri- 
marily to the 
scale value for any personality statement such 
that the scale value indicates the position of 


the statement on the social desirability con- 
tinuum (p. 3). 


It also applies, as Edwards further 
points out, 

to the tendency of all subjects to attribute 
to themselves, in self-description, personality 
statements with socially desirable scale values 


and to reject those with socially undesirable 
scale values (p. vi). 


Whereas the above concepts of ‘‘de- 
fensiveness’’ have been applied to the 
motivation, presumably greater for 
some subjects than for others, to con- 
ceal self-dissatisfaction, Edwards’ no- 
tion of “social desirability” refers to 
a characteristic of items—that is, 
their location on a continuum of 
social desirability, which determines 
the proportion of subjects who will 
attribute the characteristics to them- 
selves. 

Butler and Haigh, and also Zucker- 
men and Monashkin, conclude that 
subjects who are unwilling to attrib- 
ute undesirable characteristics to 
themselves or confess self-dissatis- 
faction are by that very fact malad- 
justed, and presumably therefore 
self-dissatisfied. This, however, is 
obviously an hypothesis for investiga- 
tion, and not necessarily true by 
definition. Self-acceptance tests do 
not directly indicate whether the 
subject is willing to express self- 
discontent, but only whether he does 
express it. Zuckerman and Monash- 
kin have also suggested, in fact, that 
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subjects giving more socially unde- 
sirable responses may have a differ- 
ent conception of what is socially 
desirable, and thus they implicitly 
suggest that these subjects may ac- 
tually not differ in terms of their need 
to respond in a socially desirable 
fashion. Such a difference in concep- 
tion of what is socially desirable 
might be expected to be associated 
with maladjustment, but it would 
certainly be a less direct indication 
of self-dissatisfaction per se. 

Four separate hypotheses could be 
advanced concerning the relationship 
between social desirability and re- 
sponses on self-acceptance (or any 
other self-report) tests. Each of these 
is capable in some degree of being 
tested. 

Hypothesis I. Social desirability 
has no effect on test responses. This 
is essentially the assumption of 
validity of self-reports: that what the 
subject says about himself is a valid 
and direct indication of what he feels 
or thinks, at least at the time, about 
himself. This, incidentally, seems to 
be a necessary assumption for phe- 
nomenologists, although it is a test- 
able proposition. 

Hypothesis II. Social desirability 
factors account for equal variance 10 
all subjects’ test scores. This assump- 
tion is tenable from Edwards’ ap- 
proach and could be held even in the 
face of most of the research to be 
reported below. It posits, in effect, 
that once one has accounted fot 
variance due to nomothetically de- 
termined social desirability in any 
subject’s test score, what is left indi- 
cates the subject’s true self-feelings. 

Hypothesis III. Social desirability, 
while it may or may not be an 1m- 
portant factor for all subjects, ac 
counts for more of the variance for 
some subjects than for others. This 
corresponds to the suggested differ 
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ences in need to perform in a socially 
desirable way, protect the self, and 
disguise self-discontent. It is inter- 
esting that such need has been sup- 
posed to be an important variable 
only for those who show relatively 
high self-acceptance or social de- 
sirability scores: the rebel, or the 
individual seeking succorance, may 
produce very low scores, as a result of 
acomplementary need to perform in 
a socially undesirable way, and still 
not necessarily differ from others in 
terms of over-all adjustment or 
“true” self-acceptance. In any case, 
such a conception as this suggests 
research determining the correlates 
of this need to perform in a socially 
desirable, or to perform in a socially 
undesirable, way. 

Hypothesis IV. Variance associ- 
ated with a nomothetically deter- 
mined social desirability factor re- 
flects differences in the conception of 
what is socially desirable. This hy- 
Pothesis is not necessarily in conflict 
with Hypothesis III: both factors 
could operate simultaneously, al- 
though separating the variance due 
toeach might be quite difficult. This, 
as well as Hypothesis III, is definitely 
lat with Hypotheses I and 


With the above distinctions in 
mind, then, the results of some in- 
Vestigations of the relationship of the 
Social desirability variable to self- 
acceptance test scores can be ex- 
eta Kenny (1956) gave 25 self- 
OUR a ae Previously employed in 
of eh by Zimmer (1954) to a group 
a Ju oe for social desirability scal- 
ee hree independent samples of 
it a ee then responded to these 
sa AR a the form of a questionnaire, 

a a piye rating scale, and a 
meer he correlations between the 
= esirability scale values and 

Scores obtained on the question- 
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naire, rating scale, and Q sort were 
.82, .81, and .66, respectively. The 
last two correlation coefficients are 
based on a “real self” scores. Social 
desirability correlated .82 with the 
“ideal self” rating scale score and .59 
with the “ideal” self Q sort. 

Edwards (1955, 1957) and Edwards 
and Horst (1953) have also shown 
that Q sorts are highly influenced by 
the social desirability variable. In a 
study reported in 1955 and reviewed 
in 1957, Edwards found correlations 
of .84 and .87 for males and females, 
respectively, between item place- 
ment on a Q sort and the social de- 
sirability scale values of the items. 
In this case, the items were those 
employed in the development of the 
Edwards Personal Preference Sched- 
ule (1953). 

In one study (Kogan, Quinn, Ax, & 
Ripley, 1957), a social desirability 
scale value-response correlation of 
67 was found in a hospitalized psy- 
chiatric patient sample diagnosed as 
psychoneurotic. The correlation in a 
control group of male college students 
was .85. It is interesting to speculate 
upon the possible significance of the 
difference in the magnitude of the 
correlation between self-description 
and social desirability values found 
for the patient and nonpatient groups. 
Perhaps, as Hypothesis IV proposes, 
maladjusted persons have different 
conceptions of social desirability in 
self-evaluative situations. 

Studies by Berger (1955), Block 
and Thomas (1955), and Zuckerman 
and Monashkin (1957) are also rele- 
vant to the problem of social desir- 
ability. These studies investigated 
the relationships between self-accept- 
ance and the clinical and “validity” 
scales of the MMPI. Employing 
different subject populations—col- 
lege undergraduate students in the 
first two studies and hospitalized 
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psychiatric patients in the latter 
investigation—and different meas- 
ures of self-acceptance, there was 
nevertheless considerable agreement 
in the findings. Self-acceptance was 
found to be significantly negatively 
correlated with a number of the 
clinical “adjustment” scales and posi- 
tively correlated (r’s ranging from 
.33 to .58) with the K scale, inter- 
preted as a measure of test-taking 
defensiveness (McKinley, Hathaway, 
& Meehl, 1948). Zuckerman and 
Monashkin took their findings to 
mean that “both self-acceptance and 
MMPI scales are probably being 
influenced more by the common trait 
of defensiveness than by actual ad- 
justment” (p. 147). The term “de- 
fensiveness,”’ with its connotation of 
maladjustment, seems less applicable 
here than “social desirability,” espe- 
cially in view of the high correlation 
(.81) reported by Edwards (1957) 
between the K scale and his Social 
Desirability Scale. With approxi- 
mately 65% of the variance accounted 
for in the covariation of these two 
scales, the results of the three studies 
cited above would seem to be a func- 
tion of the common denominator of 
social desirability. Thus, the items 
on the self-acceptance tests used and 
those on the MMPI are highly re- 
lated to the scale values on Edwards’ 
Social Desirability Scale. 

In the study referred to earlier, 
Cowen and Tongas found a correla- 
tion of .91 between social desirability 
ratings and the self-concept score of 
the IAV. A correlation of .96 was 
obtained between social desirability 
ratings and the ideal-self score on the 
IAV. The latter correlation might 
be taken to suggest culturally stereo- 
typed conceptions of what one ought 
to be that would be consistent with 
Hypothesis IV above. In another 
investigation (Nebergall, Angelino, & 
Young, 1959), it was found that sub- 
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jects who reported high self-accept- 
ance tended to disagree with group 
judgments of adjustment. For most 
subjects, in fact, self-acceptance rat- 
ings were higher than group ratings. 
Again, these findings may be under- 
stood in terms of the individual's 
need to present himself in what he 
regards as a culturally sanctioned 
manner. 

While this discussion has been 
concerned primarily with the social 
desirability factor in self-acceptance 
tests, it seems highly probable that 
any self-report device will be affected 
by the social desirability of items or 
of available responses. Failure to 
control for the effects of this variable 
by one of several available procedures 
(Edwards, 1957) means, in effect, 
that the test in question may better 
be interpreted as a measure of social 
desirability (that is, the subject's 
conception of social desirability or 
need to perform according to it) than 
of self-acceptance. This can be il- 
lustrated by means of an hypothetical 
experiment. It might be hypothe- 
sized that need-determined percep- 
tual behavior—for example, percep- 
tual reactivity to threat—is related 
to self-acceptance (cf. Cowen et al., 
1957). Failure to control for social 
desirability in the self-acceptance 
assessment operations would make 
the results, no matter what the out- 
come, uninterpretable in terms of self- 
acceptance. In the light of what 1s 
already known about the influence 
of social desirability on self-report 
devices, the most probable interpre- 
tation of such an experiment would 
be that perceptual reactivity tO 
threat is related (or unrelated) to the 
socially desirable responding of sub- 
jects—that is, their need to be pet 
ceived in a particular way or their 
conception of how they want to be 
perceived. Not provided in this ex- 
periment are the operations for dete 
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mining the relationship between 
perceptual reactivity and “real” self- 
acceptance. 

While studies of the effect of the 
social desirability variable on many 
of the commonly employed tests of 
self-acceptance have not been done, 
the results of the investigations dis- 
cussed above would suggest that self- 
evaluative tests are particularly sus- 
ceptible to criticism on social desir- 
ability grounds. A common denomi- 
nator in research findings on self- 
acceptance may well be the variable 
of social desirability. Edwards (1957) 
and Jackson and Bloomberg (1958) 
have made a similar analysis with 
respect to the Taylor anxiety scale 
(Taylor, 1953). Systematic investi- 
gation of both the parameters and 
the effects on test behavior of social 
desirability would clearly seem to be 
in order. That self-acceptance tests 
are influenced by factors other than 
the manifest content of the items, 
however, seems beyond dispute. 


THE GENERALITY OF SELF- 
ACCEPTANCE 


: To this point the issues discussed 
ave been pertinent strictly to psy- 
chometric and methodological prob- 
ve in assessing self-acceptance. A 
urther issue to be raised, although it 
certainly has methodological rami- 
Cations, is the primarily theoretical 
question of the generality of self- 
acceptance, 
oo involves two related 
ae €ms, one empirical and the other 
ra “Hale gents of interpreta- 
me 3 mpirically, there is need of 
ont a concerning the temporal 
he y of self-acceptance; the con- 
peters of an individual’s self-ac- 
pine from one situation to an- 
auc or example, in friendly vs. 
ee eee or where self-efface- 
ee rewarded or not rewarded); 
generality of self-acceptance in 


reference to different aspects of the 
“self” (for example, in reference to 
morality vs. in reference to interper- 
sonal effectiveness); and agreement 
of different kinds of manifestation of 
self-acceptance (for example, spon- 
taneous self-appraisal vs. that mani- 
fested in an undisguised test such as 
the ACL vs. inferences drawn from a 
TAT protocol). The theoretical 
question is simply how best to con- 
strue self-acceptance. If, as has been 
suggested (Rogers, 1951), the self- 
concept and self-acceptance can be 
considered to be relatively stable 
characteristics of a person, one should 
find that situational variables have 
only a negligible effect on self-accept- 
ance, that measures of self-accept- 
ance taken in different social contexts 
are highly correlated, and that meas- 
ures taken over temporal intervals 
are likewise highly stable. If these 
questions can be answered positively, 
it would be reasonable to construe 
the self-concept, from which the dis- 
crepancy notion of self-acceptance is 
derived, as a meaningful variable on 
which there are consistent differences 
between subjects, and it would be 
highly appropriate to think of in- 
dividuals in terms of their character- 
istic levels of self-acceptance. To the 
degree that self-acceptance is a func- 
tion of variables associated with 
specific situations or types of situa- 
tions, however, it will be more fruit- 
ful to investigate self-evaluative be- 
havior per se and its situational de- 
terminants. ; 

The empirical evidence with re- 
spect to the generality of self-accept- 
ance is rather scanty. The fact that 
studies have not attacked this ques- 
tion may be attributable to the gen- 
eral assumption that self-acceptance 
is consistent. Three investigations 
have been reported which do bear on 
this question. With respect to tem- 
poral stability, Taylor (1955) reports 
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a test-retest correlation of .79 (pre- 
sumably based on self-sort—ideal- 
sort discrepancy ‘scores) over an 
interval of a week. Butler and Haigh 
(1954) report the correlations be- 
tween self-sorts and ideal-sorts for 
each subject in a control group 
(N=16) not receiving therapy for 
two Q sort administrations separated 
by a considerable period of time. 
Although consistency was apparent, 
Butler and Haigh noted that 

there are some sharp individual changes which 
indicate that alteration in self-ideal congru- 


ence does occur at times in the absence of 
therapy (p. 67). 


Concerning the influence of situa- 
tional variables of self-acceptance, a 
study by Thorne (1954) is relevant. 
Employing the IAV, Thorne found 
that following induced failure on a 
mirror drawing task, subjects whose 
initial level of self-acceptance was 
high tended to lower their self -ratings 
in the direction of a more realistic 
evaluation, while originally low self- 
accepting subjects tended to increase 
self-acceptance scores and showed 
concern over loss of self-esteem. The 
results of this study would suggest 
that self-acceptance is influenced by 
environmental events and that per- 
sons respond self-reflexively to per- 
ceived successes and failures, 

It would appear, from this brief 
discussion, that studies should be 
devoted to the problem of the gen- 
erality of self-evaluative behavior. 
Of particular interest are the ques- 
tions of temporal stability, influence 
of situational variables, and the 
effect on self-evaluation of such fac- 
tors as success, failure, and punish- 
ment. 


SELF-ACCEPTANCE VS. SELF- 
EVALUATIVE BEHAVIOR 


It has been necessary at several 
points in this discussion to point out 
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differences between a phenomeno- . 
logical and a behavioristic approach 
to self-acceptance. Since these differ- 
ences are basic to the research ap- 
proaches—not to mention the way in 
which such research is construed— 
in this general area of inquiry, and 
since these differences seem not to 
have been fully appreciated by all 
who have written on the topic, some 
further discussion of them is in order. 

A phenomenological approach to 
self-acceptance is concerned with 
self-acceptance itself, or “real” self- 
acceptance, as a totally private, sub- 
jective experience of the subject. By 
definition this is never observable by 
any other; the best that an experi- 
menter or clinician can hope to do is 
make relatively accurate guesses, or 
inferences, concerning the existence, 
or degree, of the variable asit ‘‘exists” 
in the subject. By such a definition, 
self-acceptance corresponds to Mac- 
Corquodale and Meehl’s (1948) early 
conception of an “hypothetical con- 
struct’’—something which cannot be 
observed but still is assumed to 
exist—except that there is little 
suggestion that self-acceptance even 
can be observed by anyone other than 
the subject himself. It is only with 
some difficulty, it would seem, that a 
phenomenologist can avoid the neces- 
sity of assuming the validity of self- 
reports. Representative sampling, 
and also an idiographic procedure for 
determining what are the most salient 
aspects of a subject’s self-evaluation, 
would seem to be most important in a 
Phenomenological approach to the 
assessment of self-acceptance. Social 
desirability, on the other hand, 
should be assumed mot to be a factor 
in self-reports. To assume a high de- 
gree of generality or consistency— 
temporal, situational, etc.—is not 
necessarily essential to a phenomeno- 
logical approach; however, in any 
theory which posits generalized self- 
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„acceptance as an important dimen- 
sion on which to compare people, 
empirically determined generality of 
the variable is, naturally, crucial. 

A behavioristic concern with self- 
acceptance might more clearly be 
directed toward “‘self-evaluative be- 
havior,” on the other hand. The ad- 
ditional inference of some underlying, 
real if unobservable, phenomenologi- 
cal state is not essential to a study of 
self-evaluative behavior per se; and it 
might be pointed out that self-evalua- 
tive behavior is an interesting and 
perhaps important focus of interest 
in and of itself. In such an approach, 
the assumption of validity of self- 
reports is clearly not essential; a 
clear construct-level definition of 
self-evaluative behavior, on the other 
hand, is. Generality, representative 
sampling in test construction, and the 
related question of equivalence of 
assessment operations are crucial 
questions only if the goal is to ap- 
Proach self-evaluative behavior as a 
trait, or consistent behavioral tend- 
ency, by which to classify people in a 
generalized fashion. It is quite feas- 
ae to examine self-evaluative be- 
ai as a situationally determined 

omenon, or as one determined 
a enton person interaction, 
ability en as a trait. Social desir- 
Aerei peated etc., become 
unrelated) t variab es related (or 
Shine oself-evaluative behavior, 
as o ae ae of error vari- 
Begins nd, most important, it 
Benin, See Seen matter to de- 
ment") i lates (such as ‘‘adjust- 
uve aans forms of self-eval- 
FMEN r, either in general or 
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a en is not meant to 

est in self ap enomenological inter- 

-acceptance is unsophisti- 
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conception of “internal” phenomenal 
states such as self-acceptance would 
seem to be best derived from the ob- 
servable behaviors of the person— 
that is, his self-evaluative behaviors. 
Phenomenological research would ap- 
pear, in fact, to involve complexities 
that do not attach to more behavior- 
istic efforts. 


SUMMARY AND CONCLUSIONS 


“Self-acceptance’”’ promises to be- 
come an increasingly attractive focus 
of interest in both formal and infor- 
mal psychological theory. A con- 
siderable volume of research has al- 
ready been devoted to the topic, and 
a sizeable number of tests devised for 
such research. To this date, however, 
research has contributed an unknown, 
but perhaps very small, amount of 
understanding of self-acceptance and 
its relationships to other personality 
variables. The failures of self-accept- 
ance research can be traced, at least 
in large part, to neglect of several 
crucial psychometric and methodo- 
logical principles: the unsupported 
assumption of equivalence of assess- 
ment procedures, the absence of any 
clear construct-level definition of the 
variable, failure to construct tests in 
accord with principles of representa- 
tive sampling, and questions concern- 
ing the social desirability factor in 
self-report tests. In addition, the 
absence of data concerning the gen- 
erality of self-acceptance makes re- 
search results even more difficult to 
interpret; and the implications of the 
difference between a phenomenologi- 
cal approach to self-acceptance and a 
behavioristic approach to “self-eval- 
uative behavior” have not been 
clearly understood. f 

The relative absence of systematic 
efforts in test development, standard- 
ization, and validation in this area is 
perhaps due to the fact that the focus 
of self-acceptance research to date has 
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been chiefly on the preliminary test- 
ing of hypotheses, rather than the 
development of adequate tests as a 
primary aim. A test designed solely 
for the purpose of testing one or two 
hypotheses does not, it might be 
argued, require so much care as a test 
designed to serve as a standardized 
instrument for many purposes. In- 
deed, such an argument would con- 
tinue, this care and time are not 
usually appropriate for such re- 
stricted purposes. (The development, 
use, and subsequent misuse of the 
Taylor Manifest Anxiety Scale would 
serve as a case in point as Taylor 
herself, 1956, has pointed out.) But 
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when such tests are then used in fur- 
ther research as if they had been care- 
fully and adequately constructed, 
little can ensue but error and confu- 
sion. And such seems to be the case 
in self-acceptance research. 

Perhaps it is true that these tests 
are not yet used commonly in clinical 
settings where their inadequacies 
could lead to disservice to the client; 
perhaps it is true that the tests are 
used for very little other than re- 
search. But this only makes rigorous 
test construction the more important 
if research in such a complex area is to 
produce dependable and unambig- 
uous results. 
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THE CONSTRUCTION OF UNIDIMENSIONAL TESTS 
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University of Western Australia 


It is the purpose of this article to 
review methods which have been sug- 
gested, either directly or indirectly, 
for the construction of unidimen- 
sional tests. No general survey of this 
topic appears to have been made 
previously but much help was ob- 
tained from critiques by Loevinger 
(1948), Guttman (1950a, 1950b,1950c), 
and White and Saltz (1957). 

Definition of unidimensional tests. 
A unidimensional test may be defined 
simply as a test in which all items are 
measuring the same thing. A set of 
high jumps or a set of broad jumps is 
unidimensional. A mixture of high 
jumps and broad jumps is not. In 
psychological tests, however, items 
which appear to be of the same sort 
often turn out on closer investigation 
to be measuring different things so 
that this simple definition will not 
suffice for the construction of tests. 

A more precise definition is given 
by considering the answer pattern 
that would be yielded by a unidimen- 
sional test with infallible items. If 
the items are arranged in order of 
difficulty placing the easiest first it 
will be found that a person who fails 
the first will fail all the other items; a 
person who passes the first and fails 
the second will fail all the subsequent 
items and soon. That is, the pattern 
of responses for five items could only 
be one of the forms shown in Table 1. 

With fallible items where the result 
may be affected by fluctuations in 
the ability of the subjects or in the 
difficulty of the items a perfect answer 
pattern may not be found even when 
the items do systematically measure 
the same thing. For our purposes, 
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TABLE 1 


PATTERNS OF RESPONSES FOR A UNIDIMEN- 
SIONAL Test oF Five INFALLIBLE ITEMS 


Item 


Total 

: 1 2 3 4 5 
0 Fr F F F F 
1 P F F F F 
2 P P F F F 
3 P P P F F 
4 P P P P F 
5 P P P P P 


*P=Pass F =Fail. 


however, it is sufficient to take the 
answer pattern of Table 1 as provid- 
ing a working definition of unidimen- 
sionality remembering that with falli- 
ble items the answer pattern will be 
disturbed by random error. 

Criteria for evaluation. In evaluat- 
ing the methods, major consideration 
will be given to the extent to which a 
method provides for: 

1. A rational procedure for item 
selection : 

2. A criterion of unidimensionality 

3. An index or measure of unidi- 
mensionality 

A rational procedure for item selec- 
tion is essential. Any method which 
provides no adequate indication of 
the most likely items to be discard 
from the pool and which relies on @ 
blind trial-and-error procedure tO 
discover the unidimensional set O0 
items will be hopelessly uneconomical 
for practical test construction. In l 
general the method should be con- 
vergent so that the homogeneity of 
the item set increases as the proce 
dure is applied and items are pro 


removed from the original 
Minor departures from this 
Anciple at critical stages (usually 
ie beginning) are permissible so long 
gs the number of trials to reach a con- 
Vergent state of affairs is not too 


A criterion of unidimensionality is 
necessary so that checks can be made 
from time to time and decisions made 
either to continue culling of the item 
ol or to stop culling because a 
ous set of items has been 


‘An index of the closeness of ap- 
mation to unidimensionality is 
ø required. Failure to find a set of 
which meets a strict criterion 
dimensionality is certainly pos- 
sand, indeed, very likely. The 
items in question may, how- 
be more unidimensional than 
Other measuring instruments 
ble and would be preferable to 
mpletely heterogeneous set of 
ms alleged to measure the same 
tribute. The index of unidimen- 
lality may be related to the pro- 
e for selecting items and/or to 
proposed criterion of unidimen- 
ty, or, on the other hand, may 
uite independent of either of 
i It would be desirable for the 
ling distribution of the index of 
idimensionality to be known. 
should be noted that the index 
mensionality is not quite the 
nme 'as the measures of reproduci- 
bility discussed by White and Saltz 
(1957). Reproducibility as defined 
< White and Saltz confounds relia- 
_ ‘bility and dimensionality since the 
Measures are affected by random 
Errors as well as by systematic differ- 
ences in item content. An index of 
ünidiñensionality appropriate to the 
definition used here should be inde- 
Pendent of random error. 
_ Methods to be reviewed. Explicit 
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consideration will be given only to 
classical item analysis, Loevinger’s 
technic of homogeneous tests, the 
independence criterion method, Gutt- 
man’s answer pattern method, and 
factor analysis. Most other methods 
are special cases of one or other of 
the listed methods and for the pur- 
pose of this review it is unnecessary 
to consider them. For example, 
criticisms of the Guttman procedure 
will apply also to the Cornell tech- 
nique (Guttman, 1947a) and H tech- 
nique (Stouffer, Borgatta, Hays, & 
Henry, 1952). Certain related tech- ~ 
niques such as the Thurstone attitude 
scaling methods give tests of uni- 
dimensionality as a by-product but 
as test construction methods they 
are subject to the same criticisms as 
classical item analysis and the inde- 
pendence criterion. 


CLASSICAL ITEM ANALYSIS 


Classical item analysis using an 
internal criterion attempts among 
other things to increase the average 
item-test correlation by selecting 
from the item pool those items which 
have the highest item-test correla- 
tion. It is well-known that this pro- 
cedure tends to increase the homo- 
geneity of the test. 

From Table 1 it will be clear that 
for infallible items forming a unidi- 
mensional test the item-test correla- 
tion will be the maximum permitted 
by the shape of the distribution of 
test scores. With the answer pattern 
of Table 1 there is no overlap in the 
distribution of test scores for those 
who pass and those who fail a given 
item. The difference in mean test 
scores of passers and failers is thus a 
maximum and the biserial correla- 
tion between item and test is conse- 
quently maximized. It would appear 
then that if the culling of items pro- 
ceeds to the point where the item-test 
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correlations are all maximized the 
resulting test would be unidimen- 
sional. There are a number of diffi- 
culties which make this program un- 
likely to succeed. 

With fallible items the maximum 
item-test biserial will not be reached. 
One solution would be to correct the 
obtained biserials for attenuation 
using estimates of the reliability of 
item and test scores. Accurate esti- 
mates of the reliability of a single 
item are not easily obtainable. As- 
suming that this difficulty can be 
overcome a test would be regarded as 
unidimensional if the biserial correla- 
tions between item and test ap- 
proached the maximum after correc- 
tion for attenuation. 

Even granted the assumption that 
accurate estimates of item reliabilities 
can be obtained the method is not 
satisfactory. Consider the set of 
items with factor constitutions as 
follows: 


xı=ma+nb+ e: 
x2=ma-+nb+es 
%3=ma-+nb-+ pete; 
xı=ma+nb+qc+ e, 
xs=ma+nb+rc+es 


where a, b, and c represent different 
orthogonal common factors, m, n, 
and p, q, r are loadings; and ej, ez, és, 
é4, and es are error factors. 

Lumsden (1957) has shown that 
Items 1 and 2 form a unidimensional 
subtest and that Items 3, 4, and 5 
with differing loadings on c are not 
unidimensional. Yet the method of 
maximizing item-test correlations will 
eliminate Items 1 and 2 first and no 
unidimensional test will be discovered. 
The only way out of this impasse 
would be to try sets at random which 
would make the procedure nonra- 
tional. 
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For this method the criterion of 
unidimensionality would be maxi- 
mum biserial after correction for at- 
tenuation. No sampling distribution 
of corrected biserials appears to be 
available so that the significance of 
departures from the perfect fit cannot 
be assessed. This is specially impor- 
tant in this case since the estimates of 
item reliability on which correction 
is based are likely themselves to be 
quite unreliable. 

The logical measure of unidimen- 
sionality would be average corrected 
biserial. This would need to be con- 
sidered relative to the maximum ob- 
tainable biserial (biserial r has a maxi- 
mum of 1.0 only when the continuous 
variable is normally distributed). A 
ratio of corrected biserial to its maxi- 
mum similar to Loevinger’s H, sug- 
gests itself but the absence of a knowl- 
edge of its sampling distribution 
would restrict its value. 

An obvious possibility would be to 
use the Kuder-Richardson Formula 
20 with correction for variation in 
item difficulty suggested by Horst 
(1953). This statistic is, however, 
affected by random as well as sys- 
tematic variance and is therefore, 4 
measure of reproducibility rather 
than an index of unidimensionality- 
There would seem nothing to prevent 
the development of an index based 
on the ratio of obtained K-R 20 to 
the maximum K-R 20 for items with 
a given amount of random error. 

A search of the literature has not 
revealed any writer who has advo- 
cated the use of classical item analysis 
techniques as described above i^ 
order to produce unidimensional tests- 
Thorndike attempted to demonstrate 
the “homogeneity of intellect CAVD 
by correlating scores on subgroups © 


1 I am indebted to John Ross (University of © 
Sydney) for this suggestion. 
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items with scores on the total set of 
items and correcting the obtained 
r's for attenuation. Evidence was 
presented (Thorndike, Bergman, 
Cobb, Woodyard, 1926, p. 566) that 
these corrected correlations approxi- 
mated 1.0 and Thorndike concluded 
that this demonstrated the homo- 
geneity of CAVD tests. The logic of 
Thorndike’s procedure is impeccable 
if applied to single items or to ran- 
domly selected subgroups of items 
but his subgroups were arranged so 
as to have, like the total set, equal 
numbers of Completion, Arithmetic, 
Vocabulary, and Directions items. 
Thorndike was thus merely able to 
show that the composite score ob- 
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LoEVINGER’s TECHNIC OF HOMO- 
GENEOUS TESTS 

Loevinger’s procedure is closely 
related to classical item analysis and 
indeed she indicates (Loevinger 1947, 
p. 26) that the earlier work by Thorn- 
dike on the CAVD tests may have 
been influential in the development 
of her procedure. 

The procedure is based on two 
statistics: the “homogeneity of an 
item with a test” and the “homo- 
geneity of a test.” The first of these 
is to be used as a tool for item selec- 
tion and is a development of Long’s 
(1934) index of overlapping. The 
formula for this is given by Loevinger 
as: 


2 (‘passes” below or tied with “fails”) 


Hy=1- 


PQ—“‘passes” one above “fails” 


tained from his subsets was similar 
to the total score obtained from the 
complete set but not that the subsets 
or the total set were homogeneous in 
the sense used here. It is only fair to 
point out that Thorndike was mainly 
concerned to show that his easier 
Sets of items and his harder sets gave 
the same sort of results as the total 
Set. 

Wherry and Gaylord (1943) sug- 
gest as an alternative to factor analy- 
sis an iterative procedure based on 
classical item analysis. In this proce- 
dure each item is correlated with 
total score; those items with the 
highest correlations are selected and 
a new total formed; all items (includ- 
ing those not selected in the first 
Stage) are then correlated with the 
aa total and the procedure is con- 
is yea until a stable group of items 
a tained. White and Saltz (1957) 
ai mend this method but it would 
a appear to avoid any of the diff- 

ulties of classical item analysis. 


where P is the number passing the 
item and Q is the number failing the 
item. It is clear that for a perfectly 
unidimensional test as defined by 
Table 1 H,: will equal 1.0 since there 
will be no subjects who pass an item 
who will have scores below or tied 
with subjects who fail the item. Us- 
ing this statistic to cull a mixed set of 
items will, however, be subject to all 
the difficulties encountered with clas- 
sical item analysis. 

The index of unidimensionality is 
provided by the “homogeneity of a 
test,” H;. Loevinger notes that for a 
perfectly heterogeneous test pij= Pi 
(i.e., probability of passing an, Item 4 
having passed another Item j is the 
same as the overall probability of 
passing Item i). Fora perfectly homo- 
geneous test as defined by Table 1, 
pyj=1.0 for all cases where pir; 
(i.e., where Item i is easier than Item 
j). From this it will be seen that 
pu; has a minimum value of p; for all 


cases where p:> pj. 
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Loevinger then considers the sum: 
m 


m-l 
S= _ E pildin—pd 

i=l jei+l 
where m is the number of items and 
the item pairs are all such that 
Pi> pi - 

This sum will have a maximum 

value given by 


Sit Sy A SP 


i=l j=ițl 


for a perfectly homogeneous test and 
a value of zero for a perfectly hetero- 
geneous test. To provide an index 
with the formal properties of a mini- 
mum of zero and a maximum of 1.0, 
Loevinger divides S by Smax to give: 


m-l m 
s LX LD tlin p) 
bi i=l j=i}l 
ER m-l m 
Z X p(l- 
i=l j=i}1 


Loevinger provides a formula for 
estimating H, from sample statistics 
but points out that the sampling 
distribution is unknown and that the 


estimate is not even known to be 
unbiased. 


INTERDEPENDENCE CRITERION 


Lazarsfeld (1950), Tucker (1952), 
and Lord (1952) have pointed out 
that with a unidimensional test the 
probability of success on one item is 
independent of success in any other 
item for subjects with the same true 
score. This is at first sight paradoxi- 
cal because it would seem obvious 
that items which are measuring the 
same thing should be highly corre- 
lated. But when only subjects of the 
same true ability are considered then 
items which are measuring this abil- 
ity and nothing else can differ only 
through error and will exhibit no 


JAMES LUMSDEN 


systematic variance. If we take su 
jects who are exactly 6 feet tall 1 
different measures of height will ¥ 
only through error so that the 
urements will be independent, un 
related. The independence crite 
is undoubtedly valid and is more 
eral than any other. It makes 
assumptions about the distributios 
of ability or rectilinearity of regre 
sion. 

The criterion suggests a procedur 
for constructing unidimensional 
It would be possible to obtain re 
on a pool of items from a large g 
of subjects, to choose a number 0 
subjects with the same total score 
and then to determine say by x 
whether the items are independ 
or not. If certain items turned 
not to be independent these could bi 
rejected, new totals worked for all 
subjects in the original group, a n 
group with the same total score 
termined, and the x? test repeat 

The true scores on the test are 
however, known and the estima 
obtained from the raw test scores 
not satisfactory. O'Neil (1954) 
shown that, for subjects with — 
same obtained score, items, even 
unidimensional, tend not to be in 
pendent but to be negatively co 
lated. If there are only two items 
example then for subjects with 
obtained score of 1 the items ha 
tetrachoric correlation of —1.0 
if a subject has the first item right 
must have the second one wrong @ 
vice versa. In mathematical ter: 
Pijj=0 instead of p; as required 
the independence criterion. 
effect is known to decrease as í 
number of items is increased and i 
possible that the independence cri 
rion may be workable for fairly 
groups of items. With infallible ite 
there is, of course, no problem st 
true scores will then equal the 
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tained scores and the quoted example 
could not occur (if the items were 
unidimensional). 

Even if this problem is overcome, 
the culling of items is likely to prove 
arduous. All items in the pool are 
likely to be correlated on the first 
trial. In the absence of any knowl- 
edge about the number of items in 
the unidimensional set it is impossible 
to say whether the unidimensional 
items will be more or less intercorre- 
lated than the items it is desired to 
reject. No rational, convergent pro- 
cedure of item culling is available 
using the independence criterion. 

No special index of unidimension- 
ality is suggested for this method. 
This, of course does not matter, since 
if the method was otherwise suitable 
an index could be borrowed from one 
of the other methods. 


ANSWER PATTERN METHODS 


The Guttman procedure (1944) is 
the most important of the answer 
pattern methods and is the only one 
discussed here. Some earlier writings 
by Walker (1931, 1936, 1940) and 
Ferguson (1941) have the first ex- 
plicit discussions of the relationship 

tween answer pattern and other 
test characteristics but no sugges- 
tions for test construction were made. 
_ The answer pattern procedure con- 
Sists essentially of inspecting the 
answer pattern and removing items 
So that the remaining items have 
Ang which are as near as possible 
for ee of Table 1. It is clear that 
ist Pee items this procedure 
a e easily carried out and that a 
fs inspection of the answer pat- 
sis pec provide a clearcut crite- 
hit unidimensionality. For items 
nai exhibit slight departures from 
ae the procedure 
ribs e to eliminate items until the 

est possible approximation con- 
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sistent with retaining a sufficient 
number of items was obtained. For 
this, some measure of the closeness of 
approximation to unidimensionality 
is required. Guttman uses the co- 
efficient of reproducibility which is 
the proportion of responses which can 
be correctly predicted from the total 
raw score. For a perfectly unidimen- 
sional test it will be seen from Table 1 
that the reproducibility coefficient 
will have the value 1.0. Guttman 
suggests that a test may be regarded 
as a “scale” (i.e., as unidimensional) 
if the coefficient of reproducibility 
exceeds .90. 

The coefficient of reproducibility 
has been criticised severely by Fest- 
inger (1947) and Jackson (1949) be- 
cause it does not allow for the chances 
of obtaining high values when the 
items are heterogeneous (e.g., with 
only a few items of widely differing 
difficulties). Guttman (1947b) re- 
plied to criticism claiming that such 
factors as the number of answer cate- 
gories and the range of difficulty were 
taken into account before calculating 
the coefficient of reproducibility. 
Guttman does not give explicit rules 
but improvements to the reproduci- 
bility coefficient have been suggested 
by Jackson (1949) and Green (1954) 
which overcome some of the problems. 

The reproducibility coefficient, 
however modified, does not permit of 
a distinction being made between 
random and systematic scale dis- 
crepancies. Guttman claims (1950a, 
1950b, 1950c) that the distinction 
may be made by examining the pat- 
terns of scale discrepancies and pre- 
sents tables (p. 161) which purport to 
represent scale patterns for a perfect 
scale, a scale with random error, and 
a scale with systematic error. Evi- 
dence for random error in an item is 
said to be provided when scale errors 
are distributed randomly around the 
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cutting point for the item; evidence 
for systematic error when the scale 
errors are grouped in a systematic 
fashion. While this claim is undoubt- 
edly correct (such systematic group- 
ings are the basis of all the statistical 
analyses proposed for the problem) it 
is difficult to see how these groupings 
may be discovered by inspection and 
distinguished from random errors 
when the random errors are fairly 
large. 

Guttman (1950b) has explicitly de- 
nied any intention to use scale anal- 
ysis for the selection of items. His 
scalogram was designed merely to 
discover approximate cutting points 
for attitude scale items. Guttman in- 
deed claims that the task of scale 
analysis is to discover scales rather 
than to construct them and states 
that if a universe of attributes is 
scalable then any subset of items 
from that universe is scalable. Item 
culling is by this argument unneces- 
sary. The difficulty is that a test 
constructor (or discoverer) does not 
know precisely what “universe of 
attributes” he is sampling. Without 
precise definition he may sample a 
number of related universes. Item 
culling procedures are designed to 
distinguish between groups of items 
selected from different universes. 

It may be seen then that the an- 
swer pattern method provides no 
rational culling plan for use with 
fallible items. The index of unidi- 
mensionality provided by the plan is 
the coefficient of reproducibility 
which, despite improvements on the 
early Guttman form, does not dis- 
tinguish between systematic and 
random error. 


Factor ANALYSIS 


It is difficult to give due credit to 
whoever first suggested the use of 
factor analysis in the construction of 
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unidimensional tests. The idea is 
sufficiently obvious to be thought at 
least implicit in the writings of 
Spearman, Thurstone (1947), and 
other early factorists. The factor 
analyses of test items by McNemar 
(1942), Burt and John (1943), and 
others clearly suggest it. Papers on 
related topics by Ferguson (1941), 
Wherry and Gaylord (1943, 1944), 
Carroll (1945), and Loevinger (1948) 
discuss with varying degrees of com- 
pleteness the possibility of factor 
analyzing items in test construction. 

Under restrictions which appear 
plausible for ability test items it is 
easy to show (vide Lumsden, 1957) 
that for a unidimensional test the 
matrix of tetrachoric item intercorre- 
lations is of unit rank. One factor 
analytic procedure for constructing 
unidimensional tests is to extract a 
single factor from the item intercor- 
relations, cull out the items which 
have large residuals, reanalyze, and 
continue until a satisfactory fit to a 
single factor solution is obtained. 
Wolfle (1940) in a well-known jibe at 
Brown and Stephenson (1933) said: 
“if one removes all tetrad differences 
which do not satisfy the criterion, 
the remaining ones do satisfy it” (p. 
9). That is exactly what is done in 
this factor analytic technique of con- 
structing unidimensional tests. The 
difference between the two situations 
is, of course, that Brown and Stephen- 
son had asserted that their tests, all 
of them, would meet the tetrad differ- 
ence criterion, while here it is merely 
hoped that a subset of items will meet 
the criterion. 

The procedure is quite simple. But 
is the culling procedure rational? 


Will the set of items converge to uni- 


dimensionality? 


It is evident that convergence of 


the factor analytic procedure to a 
unidimensional subset of items can- 
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not be guaranteed. If the unidimen- 
sional set is much less numerous than 
the heterogeneous items in the pool 
then it is probable that the unidi- 
mensional set will not have sufficient 
influence on the nature of the first 
factor extracted to prevent the oc- 
currence of large residuals among the 
unidimensional set. These items will 
be discarded first and the procedure 
will not converge to a single factor 
solution. 

If, however, the items are carefully 
preselected on empirical and a priori 
grounds, it seems likely that the state 
of affairs of the preceding paragraph 
will not occur. If items are deliber- 
ately made parallel or if there is evi- 
dence for parallelism then it would 
follow that the dimension of any 
unidimensional test and the dimen- 
sions of the heterogeneous items in 
the total pool, will normally be highly 
correlated. In this circumstance the 
influence of the unidimensional set on 
the first factor extracted may well be 
greater than the actual numbers of 
items suggest, and the method may 
therefore be expected to converge. 
The procedure of preselecting will 
also tend to increase the size of the 
unidimensional set in the pool and 
this will also increase the probability 
of convergence. 

Lumsden (1959) found that four 
Subsets of number series items se- 
lected on a priori grounds converged 
erly and that three of them met a 
airly stringent test of unidimen- 
Slonality when cross-validated with a 
fresh group of subjects. 

One procedure that should almost 
Suarantee convergence (if a sizable 
unidimensional set exists) is to carry 

ut a preliminary complete centroid 


analysis and then to select for further 


analysis those items which appear in 
: a os strips (i.e., roughly co-linear) 
the factor space. This appears to 


be the procedure advocated by 
Cattell (1957) for his factor homo- 
geneous scale except that he would 
require the additional restriction that 
the factor have significance in a more 
general factor space than that pro- 
vided by the item intercorrelations. 
The complete centroid procedure with 
rotation could indeed be used with- 
out further analysis except that the 
problems of estimating communali- 
ties and determining goodness of fit 
are more complicated than for the 
unit rank case. 

The criterion of unidimensionality 
suggested for item culling is the size 
of the residuals. This must be con- 
sidered with relation to the sampling 
distribution of residuals. Unfortu- 
nately there is no exact solution to 
this problem. Many methods have 
been suggested (Cattell, 1952) but 
none can be regarded as satisfactory. 
A reasonable solution for test con- 
struction purposes would be to use 
one of the simpler procedures (e.g., 
standard error of average r) and ap- 
ply it rather severely. Increased 
availability of automatic computing 
services may permit the use of maxi- 
mum likelihood methods of factoriz- 
ing which provide a test for rank. 

‘An index of unidimensionality ap- 
propriate to the method is the ratio 
of first factor variance to total bipolar 
factor variance after a complete 
centroid analysis with subjects who 
were not used for item selection. In 
most cases the ratio of first to second 
factor variance would seem to give a 
reasonably useful index. This index 
has no fixed maximum value and 
little is known about the extent to 
which it may be affected by errors of 
sampling or of measurement. 


Discussion 


It seems clear that none of the 
methods examined can be regarded as 
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satisfying all three of the main criteria. 
Only factor analysis appears to offer 
a rational procedure for item selec- 
tion. The criteria and indices of uni- 
dimensionality are unsatisfactory for 
all methods. 

This review has considered each of 
the methods as if they were complete, 
self-consistent creations of a single 
writer. With the exception of the 
Guttman answer pattern method and 
the Loevinger method this is not so. 
The various “natural” criteria and 
indices suggested for each of the 
methods are not necessary conse- 
quences of the choice of item selection 
method. Combinations of different 
elements from different methods are 
possible and this circumstance justi- 
fies a modified optimism. Thus a 
modification of the coefficient of re- 
producibility which produced an ac- 
ceptable index of unidimensionality 
would not be cogent evidence for 
adopting an answer pattern method 
ay would greatly improve all meth- 

s. 
Greatest emphasis has been deliber- 
ately placed on item selection ra- 
tionale since this topic appears to 
have been relatively neglected in the 
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SUMMARY 
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For the present purposes we shall 
take our definition of Q methodology 
from Stephenson (1953), not be- 
cause he has been succinct, but be- 
cause he has been more comprehen- 
sive in his interest than other writers. 
Accordingly, Q as conceived by Cat- 
tell or Mowrer and related methodo- 
logical topics discussed by such 
writers as Cronbach are not included 
in the present review of recent 
studies. 

In his 1953 publication, The Study 
of Behavior, Stephensofi informs us 
with a modesty which is charac- 
teristic for this book that ‘... the 
science of behavior can be immeas- 
urably improved by attending to a 
few principles upon which we have 
based the method now well known as 
‘Q-technique’”’ (p. 1). Time does not 
permit the long series of quotations 
which would be necessary in order to 
indicate fully Stephenson’s concept 
of Q methodology, but a few addi- 
tional quotations remind us that he 
was definite in his point of view. 

Our object has been to make it possible for 
studies to be undertaken on single cases (p. 2).1 
Briefly, a statement of the kind “All crows 
are black” is a general proposition. To say 
that “A crow is black” is clearly singular, but 
not testable. When, however, we can point to 
a particular crow X and assert that it is black, 
a singular testable proposition is at issue 
(p. 42). There never was a single matrix of 
scores to which both R and Q apply (p. 15). 


1 “By a ‘single case’ we mean, for the mo- 
ment, a single person under study or a single 
group of interacting persons. ... what is in- 
volved is whether individual differences are 
postulated or whether singular propositions 
are being tested. The latter alone are our con- 
cern” (Stephenson, 1953, p. 2). 
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A defining summary of Stephen- 
son’s view of Q methodology could 
include at least six points: 

1. Q method appears to require 
ipsative variables, particularly Q 
sorts. 

2. Q method lends itself to corre- 
lations between people or between 
different conditions for the same per- 
son. 

3. Q method requires a concep- 
tually structured set of statements in 
order to interpret the correlations be- 
tween people—each set of statements 
comprising systematic combinations 
of different levels of the various hypo- 
thetical effects. 

4. Q method permits a study of a 
person by means of analysis of vari- 
ance of the statements, assuming that 
the sorted statements were initially 
structured as replications of the pos- 
sible combinations of a priori effects 
and levels of reaction. 

5. Q method favors a dependency 
type emphasis in factor analyses with 
rotations determined by the nature 
of the propositions concerning the 
variates. 

6. Q method leaves unanswered 
the question of the parent population 
from which the individual is drawn; 
the method examines singular prop- 
ositions on the assumption that some- 
where there are more people like the 
one under scrutiny. 

To date the most conspicuous us¢ 
of Q methodology has been made by 
the so-called ‘‘self psychologists” who 
view discrepancies between ones 
self-perception and the perception of 
an ideal self as an indication © 
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maladjustment. This interpretation 
of psychological maladjustment is 
consistent with Rogers’ belief in the 
self-actualizing function of the per- 
sonality and, as a consequence, finds 
direct application in his studies of the 
efficacy of psychotherapy (Rogers & 
Dymond, 1954). The work done by 
the Chicago group during the first 
half of the present decade used the 
now familiar device of correlating the 
individual’s Q sort describing his true 
self with his Q sort for his ideal self. 
Increases in these correlations dur- 
ing the course of therapy were taken 
as evidence of improvement. This 
was a notable application of Q meth- 
odology because in the opinion of 
many persons these studies com- 
prised the first acceptable indication 
that psychotherapy was efficacious 
in producing personality change. 
Beyond using Q methodology in es- 
tablishing this important landmark, 
the Chicago group was able to illu- 
Minate some of the features of the 
Psychotherapeutic process by factor 
analyzing the intercorrelations 
among various Q sorts for a given pa- 
tient. The case of Mrs. Oaks is illus- 
Sealing During the course of therapy 
5 r concept of self became much more 
Pareles: and there were some 
as in her concept of an ideal 
scree er therapeutic progress was 
ed arized by a factor analysis of 
a ‘tercorrelations among Q sorts 
ade at various stages in the course 
of her therapy. 
ea have been several evalua- 
SA the validity of Q sorts as 
es T of adjustment, particularly 
ee standpoint of their appro- 
tice s as criteria for therapeutic 
(1955) » For example, Friedman 
oe supported the self-ideal con- 
oe y concept of good adjustment 
ban study which involved a com- 
son between normals and neu- 


133 


rotics. The neurotic group was de- 
scribed as tending to regard their 
self-qualities as very much different 
from the way they would like them 
to be. Cartwright (1957) emphasized 
the consistency interpretation of 
good adjustment by showing that 
after psychotherapy subjects de- 
scribe themselves in relation to im- 
portant persons in their environment 
with as much consistency as controls. 
An increase in self-ideal congruence 
for a group of high school boys after 
counseling was reported by Caplan 
(1957), and Turner and Vanderlippe 
(1958) reported that college students 
with high self-ideal congruence 
tended to have more extracurricular 
activities and to have higher scholas- 
tic averages than students with low 
self-ideal congruence. 

The apparent validity of self-ideal 
congruence was examined by Chase 
(1957), who compared adjusted and 
maladjusted hospital cases with re- 
spect to the various possible correla- 
tions involving Q sorts. Only those 
correlations containing the self-sort 
distinguished between the adjusted 
and the maladjusted group. 

The Q sort approach to adjustment 
is subjected to further scrutiny by 
Kogan, Quinn, Ax, and Ripley (1957). 
Using two comparable samples, one 
psychiatric patients and the other 
university students, sorts for four 
different conditions were obtained. 
The average sorts for the patient and 
the student groups were correlated 
for each of the four conditions. It 
was found that a great portion of the 
variance in these correlations could 
be accounted for in terms of either of 
two extraneous variables, the social 
desirability of the sorted statements 
or a sickness-health variable. Ed- 
wards (1955) had described the im- 
portance of social desirability in Q 
sorts as early as 1955. 
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There are reports which challenge 
the appropriateness of Q sorts as a 
direct evidence of the efficacy of 
psychotherapy. For example, Taylor 
(1955) undertook a study on the as- 
sumption that repeated introspec- 
tion would produce therapeutic type 
changes in self-concept. His subjects 
made repeated Q sorts. In conse- 
quence of this procedure, there was 
an increase in positive self-concepts, 
and the self-ideal correlation in- 
creased. From this one need not in- 
fer that self-concepts are unstable, 
however. Engel (1959) examined the 
self-concepts of a group of adolescents 
in 1954 and again in 1956, and re- 
ported that items relative to positive 
self-concepts had appreciable stabil- 
ity as indicated by a stability corre- 
lation of .69. 

Levy (1956) challenged the mean- 
ing of self-ideal discrepancies by 
comparing  self-ideal correlations 
based on the Butler and Haigh (1954) 
items with the correlation between 
sorts for an actual and an ideal home 
town. He found these two sets of 
actual-ideal correlations to be corre- 
lated with each other to the order of 
-70. Because of this, he suspects that 
the discrepancies perceived between 
actual and ideal states of affairs have 
implications for the individual’s ad- 
justment, regardless of the area in 
which the discrepancy is shown. 

Although Q sorts are frequently 
employed in the published literature, 
the use is often relatively uncritical. 
In some instances it would appear 
that a normative procedure would 
have served the investigator’s pur- 
poses better than the ipsative Q 
sorts. It appears probable, however, 
that investigators have been en- 
couraged by the availability of Q sort 
procedures, and some of the resulting 

studies might not have been under- 
taken if only a normative type em- 
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phasis had been available. For ex- 
ample, in Stewart’s (1958) study of 
the relationship between manifest 
anxiety and mother-son identifica- 
tion, facets of mother-son identifica- 
tion are readily revealed and usefully 
quantified by correlations between 
various sorts provided by the mothers 
and their sons. Stewart found that 
the boys with the greatest manifest 
anxiety were those with the greatest 
discrepancy between their self-per- 
ceptions and their mother’s ideal for 
them. 

Correlating Q sorts was a con- 
venient device for Kalis and Bennett 
(1957), who wished to show that 
communication between the patient 
and members of his family was im- 
proved for those patients whose hos- 
pitalization had been successful. The 
importance of similarity of self-per- 
ceptions in interpersonal relation- 
ships is further illuminated by Cor- 
sini’s (1956) use of Q sorts in his 
study of happiness in marriage. 
These studies are reminiscent of a re- 
port by Revie (1956), who used Q 
sorts to describe both the teacher's 
and the school psychologist’s concept 
of pupils. As a result of their inter- 
action, both the teacher’s and the 
psychologist’s concept of the pupil 
changed. 

Q sorts have been used in many 
different ways, particularly in the 
study of personality. Shontz (1956) 
used Q sorts in order to examine the 
concept of a healthy personality, 
while Reznikoff and Toomey (1958) 
worked out a system of weightings 
whereby observers’ Q sorts of patients 
may be scored to estimate the degree 
of emotional disturbance. Epstein 
and Smith (1956) used Q sorts as 4 
sociometric device by having stu- 
dents Q sort their fellows with re- 
spect to the degree of hostility 1” 
their behavior. Fiske and Van Bus- 
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kirk (1959) used Q-sort procedures in 
order to examine the stability of sen- 
tence completion test interpretations, 
and Doleys and Kregarman (1959) 
report that self-ideal congruence does 
not measure frustration tolerance. 
Nahinsky (1958) used a self-ideal 
comparison to distinguish career 
from noncareer naval officers, and 
Whiting (1959) had nurses, patients, 
and physicians sort statements con- 
cerning the importance of various 
aspects of the nurse’s work. This ap- 
pears to be the kind of a study where 
rating scales, inventories, or check 
lists could not have served the in- 
vestigator’s purposes as well as the 
ipsative sort. 

The unique value of Q sorts has 
not been made sufficiently explicit to 
permit an investigator to know the 
kinds of situations which call for 
ipsative procedures and the kinds of 
situations in which his purposes will 
be better served by normative proce- 
dures. There are numerous studies in 
the literature which employ Q sorts 
without indicating why this particu- 
lar method was chosen. Sometimes 
it appears that Q sorts are used be- 
Cause no reliable normative instru- 
Ment is available to distinguish be- 
tween persons along a relevant con- 
tinuum. The question of the reliabil- 
ity or the validity of the Q sort is 
Tarely raised, and if practice alone 
ee acted one could infer that 
Beck E and valid ipsative distinc- 
on ased on a Q-sort procedure are 
aa easier to establish than reliable 

nd valid normative procedures. 
TAN p were true, and your re- 
Si i as not seen material which 
API ead to this conclusion, one 
ED still question the use of an 
iia, Procedure showing intra- 
Bee idual differences when a norma- 
di procedure dealing with inter- 

tvidual differences appears to be 
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indicated by the general require- 
ments of the investigation. For ex- 
ample, Morsh (1955) described the 
use of a Q-sort procedure in securing 
the classes’ evaluation of the teachers. 
Why an ipsative type evaluation of 
teachers is preferable to a normative- 
type procedure is not indicated in his 
report. 

In a study of the relationship be- 
tween some personality variables and 
speed in decision making, Block and 
Peterson (1955) used the staff's Q 
sort of the subject as a measure of 
personality. Although the results of 
this investigation are interesting and 
worthwhile, it would appear that the 
emphasis is one of individual dif- 
ferences and that a normative meas- 
ure of personality would have been 
the logical choice. Both Cattell 
(1944) and Guilford (1954) have 
warned that ipsative measures should 
not be used in attempts to study in- 
dividual differences. The amount of 
error in such a maneuver need not be 
invariably great, however. For ex- 
ample, Block (1957) matched items 
from a Q sort with items that were 
used in a normative-type rating 
and found that in one sample the 
correlations between various items 
ranged from .63 to .88, while similar 
correlations for another sample 
ranged from .31 to 74. Apparently, 
the error involved in using ipsative 
item scores in a normative manner 
may vary greatly from item to item 
and from sample to sample. 

In addition to their applications in 
various studies of personality, Q 
sorts are also applied in the study of 
psychopathology. For example, 
Rogers (1958) found that the self- 
ideal congruence for paranoid schizo- 
phrenics was greater than for nor- 
mals. His approach is noteworthy 
because of its novelty. Instead of 
having his subjects sort cards, he 
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asked them to manipulate a red 
square over a blue square, with the 
red square representing the self, the 
blue square representing the ideal, 
and the overlap representing the de- 
gree of congruence. Although the 
spatial interpretation that the sub- 
ject gives his judgment is absolute 
and could lend itself to normative 
treatment, the sense of these manip- 
ulations is clearly ipsative. This 
study, which was published in 1958, 
showed a high degree of self-ideal 
congruence for paranoids and can be 
compared with Friedman’s 1955 
study which included a sample of 
paranoid schizophrenics. Friedman 
found that only 3 of his 16 paranoids 
showed a low self-ideal correlation. 

Other schizophrenics are much 
more distinctive with respect to their 
behavior in the Q-sort situation. For 
example, Helfand (1956) asked vari- 
ous subjects, including schizo- 
phrenics, to simulate the Q sort of a 
former patient whose autobiography 
they read. He then computed the 
correlations between the sorts pro- 
vided by his various subjects and the 
sort provided by the former patient. 
He found that the schizophrenics’ 
simulated sorts had the lowest cor- 
relations of all. He ascribes this to a 
limitation in role-taking ability. A 
recent paper by Fagan and Guthrie 
(1959) tells us more about the 
schizophrenics. Their subjects were 
asked to describe themselves in one 
sort and to describe an average per- 
son in another. The subjects were in- 
tercorrelated for the two different 
sorts, and the two sets of intercorre- 
lations were factor analyzed. The 
authors concluded that schizo- 
phrenics, like many other patients, 
view themselves differently from the 
way they view other persons. 

The mothers of schizophrenic pa- 
tients have also been studied by Q 
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methodologists. Shepherd and Gu- 
thrie (1959) had the mothers of 20 
male schizophrenics sort 100 items 
concerning children and family life. 
These sorts made it possible to cor- 
relate each mother with every other 
one. The correlations were factor 
analyzed and the resulting factors— 
identified as Detached Authoritar- 
ianism, Inadequacy and Inconsist- 
ency, Pervasive Control, Sophis- 
ticated Denial of Inadequate Mother- 
ing, and Annoyance and Rejection— 
broaden our view of the various 
qualities or dimensions of schizo- 
phrenic mothering. One immedi- 
ately becomes interested in the man- 
ner in which one might generalize 
from these mothers of schizophrenics 
to other mothers and thereby gauge 
how broadly applicable one might 
find such dimensions of schizophenic 
mothering. Unfortunately, this 1s 
one of the ways in which Q method- 
ology is weakest. We do not know 
what population the individual or 
individuals under scrutiny represent. 
Stephenson (1953) seems to feel that 
this really does not matter as long as 
he can assume that there are other 
similar individuals somewhere. He 
calls ducking this practical issue test 
ing a “singular proposition.” i 
There are many published studies 
which involve factor analyzing inter- 
correlations between persons. AC- 
cording to Stephenson’s criteria, how- 
ever, only a few of these would qual- 
ify as an application of Q methodol- 
ogy. Since Stephenson states that 
there is no matrix of correlations 
which can be studied by both R and 
Q methods, one is inclined to cor 
clude that one can appropriately 1- 
tercorrelate persons for Q purpose 
only when the similarity of the pet 
sons is expressed by a correlatio? 
based on ipsative scales, i.e., scales 
on which people have distinguishe 
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between items and not necessarily 
scales which distinguish between 
people on any normative basis of in- 
dividual differences. Thus, your re- 
viewer’s factor analyses of various 
diagnostic groups, although de- 
signed to show that different varie- 
ties of patients may have the same 
diagnosis, should not be considered 
as application of @ methodology be- 
cause the correlations between per- 
sons were based on standard rating 
scales designed to show individual 
differences. There are many such 
obverse factor analyses, and although 
they are commonly called Q studies, 
they do not meet Stephenson’s cri- 
teria. The Bendig and Hamlin 
(1955) investigation of Rorschach 
scoring categories is another study 
of this type. 

Perhaps the most valuable ap- 
plications of factor analysis in Q 
methodology may come from studies 
of therapeutic phenomena. The pos- 
sibilities of such an approach were 
anticipated as early as 1951, when 
Fiedler published a factor analytic 
study of differences between thera- 
pists from different schools and with 
different levels of training. Despite 
the potential of such studies for help- 
ing to place psychotherapy on a ra- 
tional, empirically verifiable basis, 
only a few students of psychotherapy 
appear to be ready to study thera- 
peutic phenomena with the sys- 
tematic planfulness which Q method- 
ology could facilitate. In one such 
study, Nunnally (1955b) had a thera- 
pist describe a patient by means of 
Q sorts on eight successive occasions. 
The factor analysis of these inter- 
correlations yielded two factors—one 
concerning relationships with the 
therapist and the other relating to 
intrapersonal confidence. 

The Peterson, Snyder, Guthrie, 
and Ray (1958) investigation of 
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therapeutic biases provides a promis- 
ing exploration. They approached 
their study in a sound manner by 
thoughtfully structuring the sample 
of statements which comprised their 
Q sorts, systematically using varia- 
tions of such hypothetical dimensions 
as direction of gain, attitudes, mode 
of change, and area of conflict. The 
sample of therapists who were inter- 
correlated was drawn from graduates 
of their own program so that one is 
not left up in the air with respect to 
the population of persons to whom 
the results may be generalized. As 
is usual in such studies, the factors 
were interpreted on the basis of the 
items which received a characteristic 
sort by persons who had high load- 
ings on the factor. The practice of 
interpreting persons in terms of item 
smacks of R methodology and re- 
minds us that people are usually 
more distinguished by their behavior 
than behavior is distinguished by the 
people who perform it. 

Thrush published an interesting 
study in 1957. Using a sample of 60 
statements descriptive of problems 
encountered by a counseling agency, 
the staff made sorts of the level and 
kind of service each problem would 
require. These sorts were made in 
1952 and again in 1956. On the basis 
of these sorts, the members of the 
staff were intercorrelated for each of 
the years and the two sets of inter- 
correlations were factor analyzed 
separately. A comparison of the re- 
sults indicated that the emphasis in 
the agency had shifted from voca- 
tional counseling to personal adjust- 
ment counseling. Although studies 
of this kind are illuminating, they re- 
mind us that we have no rigorous 
basis for comparing the results of 
factor analyses to test an exact 
statistical hypothesis. The question 
of how one should generalize from a 
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Q-type study is usually disregarded. 
Conger, Sawrey, and Krause (1956) 
point to an aspect of this problem in 
their study of Beck's “The Six 
Schizophrenias”’ (1954). 

In commenting upon factor anal- 
ysis in Q methodology, one should re- 
member that Stephenson indicated 
that the correlations should be in 
part expressive of the effect of dif- 
ferent kinds of operations. He in- 
tended that the intercorrelated vari- 
ates should, in some manner or an- 
other, be regarded as dependent 
variables in an experimental sense 
and not merely descriptive dimen- 
tions of a static situation. From the 
standpoint of this emphasis, the 
Sweetland and Frank (1955) study 
of ideal psychological adjustment is 
not a good example of Q methodology 
because its purpose appears to be to 
describe kinds of psychological ad- 
justment rather than to reveal the 
effects of certain operations, i.e., it is 
not a dependency-type analysis. This 
descriptive use of the Q-type factor 
analysis is not unique to Sweetland 
and Frank, however; other examples 
would include Broen’s (1957) factor 
analytic study of religious attitudes. 

Many of the samples of statements 
which have been sorted in Q meth- 
odology appear to have been some- 
what informally assembled, and as a 
consequence, the analyses performed 
on the sorts provided by various per- 
sons or by the same person under 
various instructions have an uncer- 
tain meaning. We do not know from 
what parent population of behavior 
they might conceivably be drawn or 
from what specific theory they could 
have been generated. It is probably 
for this reason that we find relatively 
few studies where the Q sort arrays 
for an individual or a group of in- 
dividuals are submitted to an analysis 
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of variance. This is unfortunate be- 
cause the difference or similarity be- 
tween the Q sorts of two individuals 
or the Q sorts of an individual under 
two or more conditions must be ex- 
plained in terms of the sorted items 
which comprise the Q arrays. If the 
items had been included in the sample 
as a priori representatives of theo- 
retically relevant classes of behavior, 
then the order given to the items 
could in the case of any given Q sort 
be entered into an analysis of vari- 
ance. In this way the relative status 
which the sorter assigned to various 
a priori classes of items could be re- 
vealed. In many studies, however, 
a defensible a priori classification of 
behavior with respect to kinds and 
levels is not possible because the area 
of inquiry is not well known, no sys- 
tematic theory can be confidently ap- 
plied, and in a sense the investigation 
is exploratory. If, in the study of 
such an area of behavior, Q method- 
ology were indicated, it would seem 
desirable first to intercorrelate and 
factor analyze the items in the R 
tradition. Then a sample of state- 
ments for Q sorts could be arranged 
so that the various factors could be 
represented in a balanced design. 
From such structured samples of 
statements, the Q methodology could 
be applied in the recommended man- 
ner by first factor analyzing the 
variates (e.g., people) and then ex- 
plaining the Q factors in terms of an 
analysis of variance of the sorts pro- 
vided by the variates. The reviewer 
saw no studies where the domain of 
behavior was first explored by an 
R-type analysis as a basis for build- 
ing a structured sample of state- 
ments for the Q sorts. Where anal- 
ysis of variance had been applied to 
Q sort arrays, the investigator had 
carefully structured his sample on an 
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a priori basis. Such studies are few 
and tend to be found in the recent 
literature. 

One of the earlier studies involving 
an analysis of variance was provided 
in 1956 by Kerlinger who constructed 
aset of Q statements which represent 
two kinds of educational attitudes 
interpreted at four different levels 
each. The levels for each class were 
then systematically replicated with 
10 statements each, so that there 
were 80 statements in all. 

In 1958, Rawn published a study 
of transference and resistance in 
psychotherapy. The statements to 
be sorted conformed with the re- 
quirements of a balanced block de- 
sign involving two levels of resistance 
and three classes of transference. 
These categories of class and level 
could be combined to form six kinds 
of statements. Each of these types of 
statements was interpreted in 15 dif- 
ferent ways to form the replications, 
and accordingly the structured sam- 
ple comprised 90 statements in all. 
These statements were sorted by dif- 
ferent raters and for different sessions 
i recorded psychotherapy. Because 
the way the sample was structured 

e investigator could perform an 
analysis of variance for the various 
Sorts as well as factor analyze the in- 
tercorrelations among the sorts. His 
Purposes required the analysis of 
Variance only, however. 

Pe some of the most sub- 

intial values to accrue from the 
PEN of view known as Q method- 
a tf ad lie in the fact that more of 
ful Be ecome increasingly thought- 
ad ay many matters which we 
a Soe disregarded or post- 
ice, ossibly one of these neg- 
the Ae eee is the hiatus between 
‘ate icians who continue to be 

rested in intra-individual differ- 
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ences and the psychometricians who, 
acting from the standpoint of in- 
terindividual concepts of reliability, 
have dismissed intra-individual dif- 
ferences as trivial or of no possible 
consequence. 

There is a general tendency for in- 
vestigators to compute correlation 
coefficients without giving much 
thought to the meaning or the de- 
terminers of the relationship. Q 
methodology is leading us to think 
more realistically about features 
which contribute to the degree of 
correlation between either subjects or 
items. If, for example, the sample of 
items is not homogeneous, it would 
seem possible for several pairs of 
persons to be equally correlated with 
each other but for the various pairs 
to have their respective correlations 
because of different items of behavior. 
As a consequence, none of the items 
may characterize all of the intercor- 
related persons. Presumably a sim- 
ilar kind of situation could exist if 
items were intercorrelated for a 
group of persons representing sub- 
samples of different populations. In 
such a case the correlations found 
between any two items might vary 
considerably if they were separately 
calculated for the various subsamples 
instead of being calculated for the 
heterogeneous group. Obviously, the 
investigator is on shaky ground when 
he assumes that a correlation based 
on one sample is descriptive of some 
other sample which is comprised in 
some different manner. The com- 
position of the sample with respect to 
persons can obviously influence the 
correlation between items, or the com- 
position of a sample with respect to 
items could influence the correlation 
between persons. 

Some aspects of this problem of 
subject homogeneity were discussed 
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by Block in 1955, and in the same 
year Nunnally (1955a) described an 
hypothetical matrix where sample 
heterogeneity with respect to persons 
resulted in very low correlations be- 
tween variables while the obverse 
type correlations between the in- 
dividuals were very high. Nunnally 
implied that ipsative scores are par- 
ticularly valuable in yielding Q-type 
correlations which could reveal trends 
not apparent from R-type analyses. 
The way in which this matter de- 
pends upon the homogeneity of sam- 
ples and the way in which it may be 
related to type of scale were not 
made explicit, however. 

The growing interest in Q proce- 
dure has generated several method- 
ological studies. Cohen (1957) has 
prepared a monograph which permits 
the investigator to read correlation 
coefficients between Q sorts, and 
Creaser (1955) has recommended a 
way for determining the amount an 
item should be weighted with respect 
to a given factor. 

Goodling and Guthrie (1956) point 
out that the sample of items for Q 
sort should be selected in such a way 
as to provide maximum intersubject 
variability and minimum intrasub- 
ject variability. The question of in- 
trasubject variability is one aspect of 
the reliability question, and this has 
been attacked directly by some in- 
vestigators. For example, Hilden 
(1958) describes a sampling experi- 
ment where he begins with a universe 
of 1,575 statements from which he 
has randomly drawn 20 samples of 50 
statements each. He had four grad- 
uate students provide self-ideal sorts 
for each of the 20 random sets and 


for the total population as well. The- 


various scores, e.g., self-ideal, from 
any one set were correlated with 
each other, and the respective corre- 
ations were determined for the popu- 


J. R. WITTENBORN 


lation. When the correlations for the 
random sets were compared with the 
correlations for the parent popula- 
tion, no reliable differences were 
found. From this one might infer 
that when using items such as these, 
a sample of 50 statements may be 
sufficient for Q-sort purposes. 

There appears to be a general tend- 
ency among investigators to require 
their subjects to distribute their Q 
sorts in a quasi-normal fashion. This 
is in spite of the fact that Stephenson 
had recommended a flattened, bell- 
shaped distribution and that subse- 
quent investigators had questioned 
the desirability of quasi-normal dis- 
tributions. Jones (1956), for example, 
had noted that the free sorts of vari- 
ous groups differed appreciably from 
each other and that no group se- 
lected a bell-shaped distribution. 
Livson and Nichols (1956) hal ex- 
amined this problem from the stand- 
point of the number of discrimina- 
tions that various shaped distribu- 
tions involve, and noted that the 
more discriminations required, the 
greater the test-retest reliability of 
the sort. On the basis of this finding, 
these authors recommend that the Q- 
sort distribution should be rectangu- 
lar. The issue of forced vs. unforced 
sorts has been discussed in numerous 
contexts, and no finalagreementseems 
to have been reached. For example, 
Jones points out that there is no one 
preferred distribution, and Block 
(1956) believes, on the basis of his 
comparisons, that the forced sort 
method is equal or superior to free 
sorts. 

Whether Q methodology will, as 
Stephenson proposed, create a psy- 
chology of the individual remains to 
be seen. From the standpoint of 
psychometry with its emphasis on in- 
dividual differences or from the 
standpoint of psychoanalysis with its 
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avoidance of formal instrumentation, 
Q methodology and the devices it in- 
cludes do not provide an orthodox 
approach to the study of the in- 
dividual. Certainly those particular 
psychologists who profess to be in- 
terested primarily in the individual 
have not rushed to apply this method 
to material which is still handled on 
an anecdotal or case history basis. 
Nevertheless, Q method’s primary 
contributions to psychology appear 
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to be in the study of psychotherapy 
and the related study of persons with 
personality disorders, and there are 
indications that this methodological 
emphasis can contribute to a broad 
study of personality and numerous 
related social problems. The growing 
acceptance of this methodological 
emphasis again reminds us that 
psychologists require flexible meth- 
ods for their researches and will not 
wait for any orthodoxy. 
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PSYCHOTHERAPY AS A LEARNING PROCESS 


ALBERT BANDURA 
Stanford University 


While it is customary to concep- 
tualize psychotherapy as a learning 
process, few therapists accept the full 
implications of this position. Indeed, 
this is best illustrated by the writings 
of the learning theorists themselves. 
Most of our current methods of 
psychotherapy represent an accumu- 
lation of more or less uncontrolled 
clinical experiences and, in many in- 
stances, those who have written 
about psychotherapy in terms of 
learning theory have merely substi- 
tuted a new language; the practice re- 
mains essentially unchanged (Dollard, 
Auld, & White, 1954; Dollard & 
Miller, 1950; Shoben, 1949). 

_If one seriously subscribes to the 
view that psychotherapy is a learn- 
ing process, the methods of treatment 
should be derived from our knowl- 
edge of learning and motivation. 
Such an orientation is likely to yield 
new techniques of treatment which, 
in many respects, may differ mark- 
edly from the procedures currently 
in use. 

_ Psychotherapy rests on a very 
Simple but fundamental assumption, 
le., human behavior is modifiable 
through psychological procedures. 
When skeptics raise the question, 

Does psychotherapy work?” they 
May be responding in part to the 
Mysticism that has come to surround 
the term. Perhaps the more mean- 
ingful question, and one which avoids 
7 ales meanings associated with 
age psychotherapy,” is as fol- 
fied a an human behavior be modi- 
if s0 rough psychological means and 
Bae what are the learning mecha- 

sms that mediate behavior change? 


In the sections that follow, some of 
these learning mechanisms will be 
discussed, and studies in which sys- 
tematic attempts have been made to 
apply these principles of learning to 
the area of psychotherapy will be re- 
viewed. Since learning theory itself 
is still somewhat incomplete, the list 
of psychological processes by which 
changes in behavior can occur should 
not be regarded as exhaustive, nor 
are they necessarily without overlap. 


COUNTERCONDITIONING 


Of the various treatment methods 
derived from learning theory, those 
based on the principle of counter- 
conditioning have been elaborated in 
greatest detail. Wolpe (1954, 1958, 
1959) gives a thorough account of this 
method, and additional examples of 
cases treated in this manner are pro- 
vided by Jones (1956), Lazarus and 
Rachman (1957), Meyer (1957), and 
Rachman (1959). Briefly, the prin- 
ciple involved is as follows: if strong 
responses which are incompatible 
with anxiety reactions can be made 
to occur in the presence of anxiety 
evoking cues, the incompatible re- 
sponses will become attached to these 
cues and thereby weaken or eliminate 
the anxiety responses. 

The first systematic psychothera- 
peutic application of this method 
was reported by Jones (1924b) in the 
treatment of Peter, a boy who showed 
severe phobic reactions to animals, 
fur objects, cotton, hair, and me- 
chanical toys. Counterconditioning 
was achieved by feeding the child in 
the presence of initially small but 
gradually increasing anxiety-arousing 
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stimuli. A rabbit in a cage was placed 
in the room at some distance so as 
not to disturb the boy's eating. Each 
day the rabbit was brought nearer 
to the table and eventually removed 
from the cage. During the final stage 
of treatment, the rabbit was placed 
on the feeding table and even in 
Peter's lap. Tests of generalization 
revealed that the fear responses had 
been effectively eliminated, not only 
toward the rabbit, but toward the 
previously feared furry objects as 
well. 

In this connection, it would be in- 
teresting to speculate on the diag- 
nosis and treatment Peter would have 
received had he been seen by Melanie 
Klein (1949) rather than by Mary 
Cover Jones! 

It is interesting to note that while 
both Shoben (1949) and Wolpe 
(1958) propose a therapy based on 
the principle of counterconditioning, 
their treatment methods are radically 
different. According to Shoben, the 
patient discusses and thinks about 
stimulus situations that are anxiety 
provoking in the context of an inter- 
personal situation which simultane- 
ously elicits positive affective re- 
sponses from the patient. The thera- 
peutic process consists in connecting 
the anxiety provoking stimuli, which 
are symbolically reproduced, with 
the comfort reaction made to the 
therapeutic relationship. 

Shoben’s paper represents primar- 
ily a counterconditioning interpreta- 
tion of the behavior changes brought 
about through conventional forms of 
psychotherapy since, apart from high- 
lighting the role of positive emotional 
reactions in the treatment process, 
no new techniques deliberately de- 

signed to facilitate relearning through 
counterconditioning are proposed. 

This is not the case with Wolpe, 
who has made a radical departure 


ALBERT BANDURA 


from tradition. In his treatment, 
which he calls reciprocal inhibition, 
Wolpe makes systematic use of three 
types of responses which are antag- 
onistic to, and therefore inhibitory of, 
anxiety. These are: assertive or ap- 
proach responses, sexual responses, 
and relaxation responses. 

On the basis of historical informa- 
tion, interview data, and psycho- 
logical test responses, the therapist 
constructs an anxiety hierarchy, a 
ranked list of stimuli to which the 
patient reacts with anxiety. In the 
case of desensitization based on re- 
laxation, the patient is hypnotized 
and given relaxation suggestions. He 
is then asked to imagine a scene 
representing the weakest item on the 
anxiety hierarchy and, if the relaxa- 
tion is unimpaired, this is followed 
by having the patient imagine the 
next item on the list, and so on. 
Thus, the anxiety cues are gradually 
increased from session to session until 
the last phobic stimulus can be pre- 
sented without impairing the re- 
laxed state. Through this procedure, 
relaxation responses eventually come 
to be attached to the anxiety evoking 
stimuli. 

Wolpe reports remarkable thera- 
peutic success with a wide range of 
neurotic reactions treated on this 
counterconditioning principle. He 
also contends that the favorable out- 
comes achieved by the more conven- 
tional psychotherapeutic methods 
may result from the reciprocal in- 
hibition of anxiety by strong positive 
responses evoked in the patient-ther- 
apist relationship. i 

Although the counterconditioning 
method has been employed most ex- 
tensively in eliminating anxiety- 
motivated avoidance reactions and 
inhibitions, it has been used with 
some success in reducing maladaptive 
approach responses as well. In the 
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latter case, the goal object is re- 
peatedly associated with some form 
of aversive stimulus. 

Raymond (1956), for example, 
used nausea as the aversion experi- 
ence in the treatment of a patient 
who presented a fetish for handbags 
and perambulators which brought 
him into frequent contact with the 
law in that he repeatedly smeared 
mucus on ladies’ handbags and de- 
stroyed perambulators by running 
into them with his motorcycle. 
Though the patient had undergone 
psychoanalytic treatment, and was 
fully aware of the origin and the 
sexual significance of his behavior, 
nevertheless, the fetish persisted. 
> The treatment consisted of show- 
ing the patient a collection of hand- 
bags, perambulators, and colored il- 
lustrations just before the onset of 
nausea produced by injections of 
apomorphine. The conditioning was 
repeated every 2 hours day and night 
for 1 week plus additional sessions 8 
days and 6 months later. 

Raymond reports that, not only 
was the fetish successfully eliminated, 
but also the patient showed a vast 
improvement in his social (and legal) 
relationships, was promoted to a 
more responsible position in his work, 
and no longer required the fetish fan- 
tasies to enable him to have sexual 
intercourse. 

a drugs, especially eme- 
des a also been utilized as the 
Ric Nditioned stimulus in the aver- 
ae treatment of alcoholism (Thir- 
1953.7 1949; Thompson & Bielinski, 

as besten, 1940; Wallace, 1949). 
hea A to 10 treatments in which 
is wis it, smell, and taste of alcohol 
R fee with the onset of nausea 

cient to produce abstinence. Of 

iat P more cases on whom ade- 
* a follow-up data are reported, 
Proximately 60% of the patients 
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have been totally abstinent follow- 
ing the treatment. Voegtlen (1940) 
suggests that a few preventive treat- 
ments given at an interval of about 6 
months may further improve the re- 
sults yielded by this method. 

Despite these encouraging findings, 
most psychotherapists are unlikely 
to be impressed since, in their opinion, 
the underlying causes for the alco- 
holism have in no way been modified 
by the conditioning procedure and, 
if anything, the mere removal of the 
alcoholism would tend to produce 
symptom substitution or other ad- 
verse effects. A full discussion of this 
issue will be presented later. In this 
particular context, however, several 
aspects of the Thompson and Bielin- 
ski (1953) data are worth noting. 
Among the alcoholic patients whom 
they treated, six “suffered from men- 
tal disorders not due to alcohol or 
associated deficiency states.” It was 
planned, by the authors, to follow up 
the aversion treatment with psycho- 
therapy for the underlying psychosis. 
This, however, proved unnecessary 
since all but one of the patients, a 
case of chronic mental deterioration, 
showed marked improvement and 
were in a state of remission. 

Max (1935) employed a strong 
electric shock as the aversive stimulus 
in treating a patient who tended to 
display homosexual behavior follow- 
ing exposure to a fetishistic stimulus. 
Both the fetish and the homosexual 
behavior were removed through a 
series of avoidance conditioning ses- 
sions in which the patient was ad- 
ministered shock in the presence of 
the fetishistic object. 

Wolpe (1958) has also reported 
favorable results with a similar pro- 
cedure in the treatment of obsessions. 

A further variation of the counter- 
conditioning procedure has been de- 
veloped by Mowrer and Mowrer 
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(1938) for use with enuretic patients. 
The device consists of a wired bed 
pad which sets off a loud buzzer and 
awakens the child as soon as mictu- 
rition begins. Bladder tension thus 
becomes a cue for waking up which, 
in turn, is followed by sphincter con- 
traction. Once bladder pressure be- 
comes a stimulus for the more remote 
sphincter control response, the child 
is able to remain dry for relatively 
long periods of time without waken- 
ing. 

Mowrer and Mowrer (1938) report 
complete success with 30 children 
treated by this method; similarly, 
Davidson and Douglass (1950) 
achieved highly successful results 
with 20 chronic enuretic children (15 
cured, 5 markedly improved); of 5 
cases treated by Morgan and Witmer 
(1939), 4 of the children not only 
gained full sphincter control, but also 
made a significant improvement in 
their social behavior. The one child 
with whom the conditioning approach 
had failed was later found to have 
bladder difficulties which required 
medical attention. 

Some additional evidence for the 
efficacy of this method is provided 
by Martin and Kubly (1955) who ob- 
tained follow-up information from 
118 of 220 parents who had treated 
their children at home with this type 
of conditioning apparatus. In 74% 
of the cases, according to the parents’ 
replies, the treatment was successful. 


EXTINCTION 


“When a learned response is re- 
peated without reinforcement the 
strength of the tendency to perform 
that response undergoes a progressive 
decrease” (Dollard & Miller, 1950). 
Extinction involves the development 
of inhibitory potential which is com- 
posed of two components. The evo- 
cation of any reaction generates reac- 
tive inhibition (I+) which presumably 
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dissipates with time. When reactive 
inhibition (fatigue, etc.) reaches a 
high point, the cessation of activity 
alleviates this negative motivational 
state and any stimuli associated with 
the cessation of the response become 
conditioned inhibitors (,I,). 

One factor that has been shown to 
influence the rate of extinction of 
maladaptive and anxiety-motivated 
behavior is the interval between ex- 
tinction trials. In general, there 
tends to be little diminution in the 
strength of fear-motivated behavior 
when extinction trials are widely 
distributed, whereas under massed 
trials, reactive inhibition builds up 
rapidly and consequently extinction 
is accelerated (Calvin, Clifford, Clif- 
ford, Bolden, & Harvey, 1956; Ed- 
monson & Amsel, 1954). 

An illustration of the application of 
this principle is provided by Yates 
(1958) in the treatment of tics. Yates 
demonstrated, in line with the find- 
ings from laboratory studies of ex- 
tinction under massed and distrib- 
uted practice, that massed sessions in 
which the patient performed tics 
voluntarily followed by prolonged 
rest to allow for the dissipation of re- 
active inhibition was the most effec- 
tive procedure for extinguishing the 
tics. 

It should be noted that the extinc- 
tion procedure employed by Yates 1$ 
very similar to Dunlap’s method 0 
negative practice, in which the sub- 
ject reproduces the negative behav- 
iors voluntarily without reinforce- 
ment (Dunlap, 1932; Lehner, 1954). 
This method has been applied most 
frequently, with varying degrees © 
success, to the treatment of speech 
disorders (Fishman, 1937; Meissner, 
1946; Rutherford, 1940; Sheehan, 
1951; Sheehan & Voas, 1957). If the 
effectiveness of this psychothera- 
peutic technique is due primarily t9 
extinction, as suggested by Yates 
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study, the usual practice of terminat- 
ing a treatment session before the 
subject becomes fatigued (Lehner, 
1954), would have the effect of reduc- 
ing the rate of extinction, and may in 
part account for the divergent results 
yielded by this method. 

Additional examples of the thera- 
peutic application of extinction pro- 
cedures are provided by Jones (1955), 
and most recently by C. D. Williams 
(1959). 

Most of the conventional forms of 
psychotherapy rely heavily on ex- 
tinction effects although the therapist 
may not label these as such. For 
example, many therapists consider 
permissiveness to be a necessary con- 
dition of therapeutic change (Alex- 
ander, 1956; Dollard & Miller, 1950; 
Rogers, 1951). It is expected that 
when a patient expresses thoughts or 
feelings that provoke anxiety or guilt 
and the therapist does not disap- 
prove, criticize, or withdraw interest, 
the fear or guilt will be gradually 
weakened or extinguished. The ex- 
tinction effects are believed to gen- 
eralize to thoughts concerning related 
topics that were originally inhibited, 
ot to verbal and physical forms of 

ehavior as well (Dollard & Miller, 
1950), 
as evidence for the relationship 
bai permissiveness and the ex- 
> im of anxiety is provided in two 
(io ies recently reported by Dittes 
Seia, 1957b). In one study (1957b) 
fe ving an analysis of patient- 
Ke eet interaction sequences, Dit- 
a ee that permissive responses 
Oil of the therapist were fol- 
the oy a corresponding decrease in 
the aN s anxiety (as measured by 
as R) and the occurrence of 
mig Ss behaviors. A sequential 
(Dinas g the therapeutic sessions 
FA a 957a), revealed that, at the 
a of treatment, sex expressions 
accompanied by strong anxiety 
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reactions; under the cumulative ef- 
fects of permissiveness, the anxiety 
gradually extinguished. 

In contrast to counterconditioning, 
extinction is likely to be a less effec- 
tive and a more time consuming 
method for eliminating maladaptive 
behavior (Jones, 1924a; Dollard & 
Miller, 1950); in the case of conven- 
tional interview therapy, the rela- 
tively long intervals between inter- 
view sessions, and the ritualistic 
adherence to the 50-minute hour may 
further reduce the occurrence of ex- 
tinction effects. 


DISCRIMINATION LEARNING 


Human functioning would be ex- 
tremely difficult and inefficient if a 
person had to learn appropriate be- 
havior for every specific situation he 
encountered. Fortunately, patterns 
of behavior learned in one situation 
will transfer or generalize to other 
similar situations. On the other hand, 
if a person overgeneralizes from one 
situation to another, or if the gen- 
eralization is based on superficial or 
irrelevant cues, behavior becomes 
inappropriate and maladaptive. 

In most theories of psychotherapy, 
therefore, discrimination learning, 
believed to be accomplished through 
the gaining of awareness or insight, 
receives emphasis (Dollard & Miller, 
1950; Fenichel, 1941; Rogers, 1951; 
Sullivan, 1953). It is generally as- 
sumed that if a patient is aware of 
the cues producing his behavior, of 
the responses he is making, and of the 
reasons that he responds the way he 

«does, his behavior will become more 
susceptible to verbally-mediated con- 
trol. Voluntarily guided, discrimina- 
tive behavior will replace the auto- 
matic, overgeneralized reactions. 

While this view is widely accepted, 
as evidenced in the almost exclusive 
reliance on interview procedures and 
on interpretative Or labeling tech- 
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niques, a few therapists (Alexander & 
French, 1946) have questioned the 
importance attached to awareness in 
producing modifications in behavior. 
Whereas most psychoanalysts (Fen- 
ichel, 1941), as well as therapists 
representing other points of view 
(Fromm-Reichmann, 1950; Sullivan, 
1953) consider insight a precondition 
of behavior change, Alexander and 
French consider insight or awareness 
a result of change rather than its 
cause. That is, as the patient's anxie- 
ties are gradually reduced through 
the permissive conditions of treat- 
ment, formerly inhibited thoughts 
are gradually restored to awareness. 

Evidence obtained through con- 
trolled laboratory studies concerning 
the value of awareness in increasing 
the precision of discrimination has so 
far been largely negative or at least 
equivocal (Adams, 1957; Erikson, 
1958; Razran, 1949). A study by 
Lacy and Smith (1954), in which they 
found aware subjects generalized 
anxiety reactions less extensively 
than did subjects who were unaware 
of the conditioned stimulus provides 
evidence that awareness may aid dis- 
crimination. However, other aspects 
of their findings (e.g., the magnitude 
of the anxiety reactions to the gen- 
eralization stimuli were greater than 
they were to the conditioned stimulus 
itself) indicate the need for replica- 
tion. 

If future research continues to 
demonstrate that awareness exerts 
little influence on the acquisition, 
generalization, and modification of 
behavior, such negative results would 
cast serious doubt on the value of 
currently popular psychotherapeutic 
procedures whose primary aim is the 
development of insight. 


METHODS OF REWARD 


Most theories of psychotherapy 
are based on the assumption that the 
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patient has a repertoire of previously 
learned positive habits available to 
him, but that these adaptive patterns 
are inhibited or blocked by compet- 
ing responses motivated by anxiety 
or guilt. The goal of therapy, then, is 
to reduce the severity of the internal 
inhibitory controls, thus allowing the 
healthy patterns of behavior to 
emerge. Hence, the role of the thera- 
pist is to create permissive conditions 
under which the patient’s “normal 
growth potentialities” are set free 
(Rogers, 1951). The fact that most of 
our theories of personality a:l thera- 
peutic procedures have bev: devel- 
oped primarily through work with 
oversocialized, neurotic patients may 
account in part for the prevalence of 
this view. 

There is a large class of disorders 
(the undersocialized, antisocial per- 
sonalities whose behavior reflects a 
failure of the socialization process) 
for whom this model of personality 
and accompanying techniques of 
treatment are quite inappropriate 
(Bandura & Walters, 1959; Schmide- 
berg, 1959). Such antisocial person- 
alities are likely to present learning 
deficits, consequently the goal of 
therapy is the acquisition of second- 
ary motives and the development 0 
internal restraint habits. That anti- 
social patients prove unresponsive t0 
psychotherapeutic methods develop- 
ed for the treatment of oversociali 
neurotics has been demonstrated in å 
number of studies comparing p% 
tients who remain in treatment wit 
those who terminate treatment pr& 
maturely (Rubenstein & Lorr, 1956): 
It is for this class of patients that the 
greatest departures from tradition: 
treatment methods is needed. i 

While counterconditioning, extinct- 
tion, and discrimination learning may 
be effective ways of removing nel 
rotic inhibitions, these methods may 
be of relatively little value in develop” 
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ing new positive habits. Primary and 
secondary rewards in the form of the 
therapist's interest and approval may 
play an important, if not indispensa- 
ble, role in the treatment process. 
Once the patient has learned to want 
the interest and approval of the 
therapist, these rewards may then be 
used to promote the acquisition of 
new patterns of behavior. For certain 
classes of patients such as schizo- 
phrenics (Atkinson, 1957; Peters, 
1953; Robinson, 1957) and delin- 
quents (Cairns, 1959), who are either 
unresponsive to, or fearful of, social 
rewards, the therapist may have to 
rely initially on primary rewards in 
the treatment process. 

An ingenious study by Peters and 
Jenkins (1954) illustrates the applica- 
tion of this principle in the treatment 
of schizophrenic patients. Chronic 
patients from closed wards were ad- 
ministered subshock injections of 
insulin designed to induce the hunger 
drive. The patients were then en- 
couraged to solve a series of graded 
Problem tasks with fudge as the re- 
ward. This program was followed 5 
days a week for 3 months. 

Initially the tasks involved simple 
mazes and obstruction problems in 
which the patients obtained the food 
reward directly upon successful com- 
Pletion of the problem. Tasks of 
gradually increasing difficulty were 
then administered involving mul- 
tiple-choice learning and verbal-rea- 
Soning problems in which the experi- 
Menter personally mediated the pri- 
aa rewards. After several weeks of 
rade solving activities the 
Fs in injections were discontinued 
nate a rewards, which by this 
oe ad become more effective, were 
E T solving interpersonal prob- 
E at the patients were likely to 

counter in their daily activities 


both inside and outside the hospital 
setting, 
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Comparison of the treated group 
with control groups, designed to iso- 
late the effects of insulin and special 
attention, revealed that the patients 
in the reward group improved signifi- 
cantly in their social relationships in 
the hospital, whereas the patients in 
the control groups showed no such 
change. 

King and Armitage (1958) report a 
somewhat similar study in which 
severely withdrawn schizophrenic pa- 
tients were treated with operant 
conditioning methods; candy and 
cigarettes served as the primary re- 
wards for eliciting and maintaining 
increasingly complex forms of behav- 
ior, i.e., psychomotor, verbal, and 
interpersonal responses. Unlike the 
Peters and Jenkins study, no attempt 
was made to manipulate the level of 
primary motivation. 

An interesting feature of the ex- 
perimental design was the inclusion 
of a group of patients who were 
treated with conventional interview 
therapy, as well as a recreational 
therapy and a no-therapy control 
group. It was found that the operant 
group, in relation to similar patients 
in the three control groups, made 
significantly more clinical improve- 
ment. 

Skinner (1956b) and Lindsley 
(1956) working with adult psychotics, 
and Ferster (1959) working with 
autistic children, have been successful 
in developing substantial amounts of 
reality-oriented behavior in their 
patients through the use of reward. 
So far their work has been concerned 
primarily with the effect of schedules 
of reinforcement on the rate of evoca- 
tion of simple impersonal reactions. 
There is every indication, however, 
that by varying the contingency of 
the reward (e.g. the patient must 
respond in certain specified ways to 
the behavior of another individual in 
order to produce the reward) adap- 
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tive interpersonal behaviors can be 
developed as well (Azran & Lindsley, 
1956). 

The effectiveness of social rein- 
forcers in modifying behavior has 
been demonstrated repeatedly in 
verbal conditioning experiments 
(Krasner, 1958; Salzinger, 1959). 
Encouraged by these findings, several 
therapists have begun to experiment 
with operant conditioning as a meth- 
od of treatment in its own right 
(Tilton, 1956; Ullman, Krasner, & 
Collins, in press; R. I. Williams, 
1959); the operant conditioning stud- 
ies cited earlier are also illustrative of 
this trend. 

So far the study of generalization 
and permanence of behavior changes 
brought about through operant con- 
ditioning methods has received rela- 
tively little attention and the scanty 
data available are equivocal (Rogers, 
1960; Sarason, 1957; Weide, 1959). 
The lack of consistency in results is 
hardly surprising considering that 
the experimental manipulations in 
many of the conditioning studies are 
barely sufficient to demonstrate con- 
ditioning effects, let alone generaliza- 
tion of changes to new situations. On 
the other hand, investigators who 
have conducted more intensive rein- 
forcement sessions, in an effort to test 
the efficacy of operant conditioning 
methods as a therapeutic technique, 
have found significant changes in pa- 
tients’ interpersonal behavior in ex- 
tra-experimental situations (King & 
Armitage, 1958; Peters & Jenkins, 
1954; Ullman et al., in press). These 
findings are particularly noteworthy 
since the response classes involved 
are similar to those psychotherapists 
are primarily concerned in modifying 
through interview forms of treatment. 
If the favorable results yielded by 
these studies are replicated in future 

investigations, it is likely that the 
next few years will witness an increas- 
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ing reliance on conditioning forms of 
psychotherapy, particularly in the 
treatment of psychotic patients, 

At this point it might also be noted 
that, consistent with the results Írom 
verbal conditioning experiments, con- 
tent analyses of psychotherapeutic 
interviews (Bandura; Lipsher, & 
Miller, 1960; Murray, 1956) suggest 
that many of the changes observed in 
psychotherapy, at least insofar as the 
patients’ verbal behavior is con- 
cerned, can be accounted for in terms 
of the therapists’ direct, although 
usually unwitting, reward and pun- 
ishment of the patients’ expressions. 


PUNISHMENT 


While positive habits can be read- 
ily developed through reward, the 
elimination of socially disapproved 
habits, which becomes very much an 
issue in the treatment of antisocial 
personalities, poses a far more com- 
plex problem. 

The elimination of socially disap- 
proved behaviors can be accom- 
plished in several ways. They may 
be consistently unrewarded and thus 
extinguished.. However, antisocial 
behavior, particularly of an extreme 
form, cannot simply be ignored in 
the hope that it will gradually extin- 
guish. Furthermore, since the suc 
cessful execution of antisocial acts 
may bring substantial material re- 
wards as well as the approval an 
admiration of associates, it is €X- 
tremely unlikely that such behavior 
would ever extinguish. 

Although punishment may lead to 
the rapid disappearance of socially 
disapproved behavior, its effects are 
far more complex (Estes, 1944 
Solomon, Kamin, & Wynne, 1953): 
If a person is punished for some 5% 
cially disapproved habit, the impulse 
to perform the act becomes, throug 
its association with punishment, ĉ 
stimulus for anxiety. This anxiety 


then motivates competing responses 
which, if sufficiently strong, prevent 
the occurrence of, or inhibit, the dis- 
approved behavior. Inhibited re- 
sponses may not, however, thereby 
lose their strength, and may reappear 
in situations where the threat of 
punishment is weaker. Punishment 
may, in fact, prevent the extinction 
of a habit; if a habit is completely 
inhibited, it cannot occur and there- 
fore cannot go unrewarded. 
Several other factors point to the 
futility of punishment as a means of 
correcting many antisocial patterns. 
The threat of punishment is very 
likely to elicit conformity; indeed, 
the patient may obligingly do what- 
ever he is told to do in order to avoid 
immediate difficulties. This does not 
mean, however, that he has acquired 
ā set of sanctions that will be of 
Service to him once he is outside the 
treatment situation. In fact, rather 
than leading to the development of 
internal controls, such methods are 
likely only to increase the patient's 
reliance on external restraints. More- 
over, “under these conditions, the 
Majority of patients will develop the 
attitude that they will do only what 
they are told to do—and then often 
only half-heartedly—and that they 
ps do as they please once they are 
fee from the therapist’s supervision 
(Bandura & Walters, 1959). 
fg addition, punishment may serve 
3 y to intensify hostility and other 
PERAN motivations and thus may 
a al Instigate the antisocial person 
isplay the very behaviors that 

© punishment was intended to 
mmg under control. 

wild aversive stimuli have been 
Pe of course, in the treatment 
paiva N patients who express a 
de litan rid themselves of specific 

Eve ing conditions. 
D rsedge and Sylvester (1955), 

example, successfully treated 
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seven cases of writer's cramp by 
means of a retraining procedure in- 
volving electric shock. In order to 
remove tremors, one component of 
the motor disorder, the patients were 
required to insert a stylus into a series 
of progressively smaller holes; each 
time the stylus made contact with 
the side of the hole the patients re- 
ceived a mild shock. The removal of 
the spasm component of the disorder 
was obtained in two ways. First, the 
patients traced various line patterns 
(similar to the movements required 
in writing) on a metal plate with a 
stylus, and any deviation from the 
path produced a shock. Following 


‘training on the apparatus, the sub- 


jects then wrote with an electrified 
pen which delivered a shock when- 
ever excessive thumb pressure was 
applied. r 

Liversedge and Sylvester report 
that following the retraining the pa- 
tients were able to resume work; a 
follow-up several months later indi- 
cated that the improvement was 
being maintained. 

The aversive forms of therapy, de- 
scribed earlier in the section on 
counterconditioning procedures, also 
make use of mild punishment. 


SocIAL IMITATION 


Although a certain amount of 
learning takes place through direct 
training and reward, a good deal of a 
person’s behavior repertoire may be 
acquired through imitation of what 
he observes in others. If this is the 
case, social imitation may serve as an 
effective vehicle for the transmission 
of prosocial behavior patterns in the 
treatment of antisocial patients. 

Merely providing a model for imi- 
tation is not, however, sufficient. 
Even though the therapist exhibits 
the kinds of behaviors that he wants 
the patient to learn, this is likely to 
have little influence on him if he 
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rejects the therapist as a model. 
Affectional nurturance is believed to 
be an important precondition for 
imitative learning to occur, in that 
affectional rewards increase the sec- 
ondary reinforcing properties of the 
model, and thus predispose the imi- 
tator to pattern his behavior after 
the rewarding person (Mowrer, 1950; 
Sears, 1957; Whiting, 1954). Some 
positive evidence for the influence of 
social rewards on imitation is pro- 
vided by Bandura and Huston (in 
press) in a recent study of identifica- 
tion as a process of incidental imita- 
tion. 

In this investigation preschool chil- 
dren performed an orienting task 
but, unlike most incidental learning 
studies, the experimenter performed 
the diverting task as well, and the 
extent to which the subjects pat- 
terned their behavior after that of the 
experimenter-model was measured. 

A two-choice discrimination prob- 
lem similar to the one employed by 

"Miller and Dollard (1941) in their 
experiments of social imitation was 
used as the diverting task. On each 
trial, one of two boxes was loaded 
with two rewards (small multicolor 
pictures of animals) and the object 
of the game was to guess which box 
contained the stickers. The experi- 
menter-model (M) always had her 
turn first and in each instance chose 
the reward box. During M’s trial, 
the subject remained at the starting 
point where he could observe the 
M's behavior. On each discrimina- 
tion trial M exhibited certain verbal, 
motor, and aggressive patterns of 
behavior that were totally irrelevant 
to the task to which the subject's at- 
tention was directed. At the starting 
point, for example, M made a verbal 
response and then marched slowly 
toward the box containing the stick- 
ers, repeating, ‘‘March, march, 
march.” On the lid of each box was a 
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rubber doll which M knocked off 
aggressively when she reached the 
designated box. She then paused 
briefly, remarked, “Open the box,” 
removed one sticker, and pasted it on 
a pastoral scene which hung on the 
wall immediately behind the boxes, 
The subject then took his turn and 
the number of M’s behaviors per- 
formed by the subject was recorded. 

A control group was included in 
order to, (a) provide a check on 
whether the subjects’ performances 
reflected genuine imitative learning 
or merely the chance occurrence of 
behaviors high in the subjects’ re- 
sponse hierarchies, and (6) to deter- 
mine whether subjects would adopt 
certain aspects of M’s behavior which 
involved considerable delay in re- 
ward. With the controls, therefore, 
M walked to the box, choosing a 
highly circuitous route along the sides 
of the experimental room; instead of 
aggressing toward the doll, she lifted 
it gently off the container. 

The results of this study indicate 
that, insofar as preschool children 
are concerened, a good deal of inci- 
dental imitation of the behaviors dis- 
played by an adult model does occur. 
Of the subjects in the experimental 
group, 88% adopted the M’s aggres- 
sive behavior, 44% imitated the 
marching, and 28% reproduced M’s 
verbalizations. In contrast, none of 
the control subjects behaved aggres- 
sively, marched, or verbalized, while 
75% of the controls imitated the 
circuitous route to the containers. 

In order to test the hypothesis that 
children who experience a rewarding 
relationship with an adult model 
adopt more of the model’s behavior 
than do children who experience & 
relatively distant and cold relation- 
ship, half the subjects in the exper 
ment were assigned to a nurturant 
condition; the other half of the sub- 
jects to a nonnurturant condition 


ing the nurturant sessions, which 
ed the incidental learning, M 
‘played with subject, she responded 
readily to the subject's bids for atten- 
tion, and in other ways fostered a 
consistently warm and rewarding 
interaction with the child. In con- 
trast, during the nonnurturant ses- 
sions, the subject played alone while 
M busied herself with paperwork at a 
desk in the far corner of the room. 
Consistent with the hypothesis, it 
was found that subjects who experi- 
enced the rewarding interaction with 
M adopted significantly more of M's 
behavior than did subjects who were 
in the nonnurturance condition. 
: A more crucial test of the transmis- 
sion of behavior patterns through the 
process of social imitation involves 
the delayed generalization of imita- 
tive responses to new situations in 
which the model is absent. A study 
of this type just completed, provides 
Strong evidence that observation of 
the cues produced by the behavior of 
0 is an effective means of elicit- 
ing responses for which the original 
Probability is very low (Bandura, 
» & Ross, in press). 
Empirical studies of the correlates 
‘ Strong and weak identification 
ee parents, lend additional support 
D e theory that rewards promote 
imitative learning. Boys whose 
Se are highly rewarding and 
Onate have been found to adopt 
€ father-role in doll-play activities 
thal 1953), to show father-son 
arity in response to items on a 
ARA questionnaire (Payne & 
a vias 1956), and to display mascu- 
1956 oe (Mussen & Distler, 
Me. 0) to a greater extent than 
whose fathers are relatively cold 
Nonrewarding, 
Sst PE of older unsocialized 
5 ao is a difficult task, since 
os relatively self-sufficient and 
readily seek involvement with 
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a therapist. In many cases, socializa- 
tion can be accomplished only 
through residental care and treat- 
ment. In the treatment home, the 
therapist can personally administer 
many of the primary rewards and 
mediate between the boys’ needs and 


gratifications. Through the repeated - 


association with rewarding experi- 
ences for the boy, many of the thera- 
pist’s attitudes and actions will 
acquire secondary reward value, and 
thus the patient will be motivated to 
reproduce these attitudes and actions 
in himself. Once these attitudes and 
values have been thus accepted, the 
boy’s inhibition of antisocial tenden- 
cies will function independently of 
the therapist. 

While treatment through social 
imitation has been suggested as a 
method for modifying antisocial pat-. 
terns, it can be an effective procedure” 
for the treatment of other forms of 
disorders as well. Jones (1924a), for 
example, found that the social ex- 
ample of children reacting normally 
to stimuli feared by another child was 
effective, in some instances, in elimi- 
nating such phobic reactions. In 
fact, next to counterconditioning, the 
method of social imitation proved to 
be most effective in eliminating inap- 
propriate fears. J . 

There is some suggestive evidence 
that by providing high prestige 
models and thus increasing the rein- 
forcement value of the imitatee’s 
behavior, the effectiveness of this 
method in promoting favorable ad- 
justive patterns of behavior may be 
further increased (Jones, 1924a; 
Mausner, 1953, 1954; Miller & Dol- 
lard, 1941). a 

During the course of conventional 
psychotherapy, the patient is exposed 
to many incidental cues involving 
the therapist’s values, attitudes, and 
patterns of behavior. They are inci- 
dental only because they are usually 
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considered secondary or irrelevant to 
the task of resolving the patient's 
problems. Nevertheless, some of the 
changes observed in the patient’s be- 
havior may result, not so much from 
the intentional interaction between 
the patient and the therapist, but 
rather from active learning by the 
patient of the therapist’s attitudes 
and values which the therapist never 
directly attempted to transmit. This 
is partially corroborated by Rosen- 
thal (1955) who found that, in spite 
of the usual precautions taken by 
therapists to avoid imposing their 
values on their clients, the patients 
who were judged as showing the 
greatest improvement changed their 
moral values (in the areas of sex, 
aggression, and authority) in the di- 
rection of the values of their thera- 
pists, whereas patients who were un- 
improved became less like the thera- 
pist in values. 


Factors ĪMPEDING INTEGRATION 


In reviewing the literature on psy- 
chotherapy, it becomes clearly evi- 
dent that learning theory and general 
psychology have exerted a remark- 
ably minor influence on the practice 
of psychotherapy and, apart from the 
recent interest in Skinner’s operant 
conditioning methods (Krasner, 1955; 
Skinner, 1953), most of the recent 
serious attempts to apply learning 
principles to clinical practice have 
been made by European psychothera- 
pists (Jones, 1956; Lazarus & Rach- 
man, 1957; Liversedge & Sylvester, 
1955; Meyer, 1957; Rachman, 1959; 
Raymond, 1956; Wolpe, 1958; Yates, 
1958). This isolation of the methods 
of treatment from our knowledge of 
learning and motivation will continue 
to exist for some time since there are 
several prevalent attitudes that im- 
pede adequate integration. 

In the first place, the deliberate use 
of the principles of learning in the 
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modification of human behavior im- 
plies, for most psychotherapists, 
manipulation and control of the pa- 
tient, and control is seen by them as 
antihumanistic and, therefore, bad. 
Thus, advocates of a learning ap- 
proach to psychotherapy are often 
charged with treating human beings 
as though they were rats or pigeons 
and of leading on the road to Orwell's 
1984, 

This does not mean that psycho- 
therapists do not influence and con- 
trol their patients’ behavior. On the 
contrary. In any interpersonal inter- 
action, and psychotherapy is no ex- 
ception, people influence and control 
one another (Frank, 1959; Skinner, 
1956a). Although the patient’s con- 
trol of the therapist has not as yet 
been studied (such control is evident 
when patients subtly reward the 
therapist with interesting historical 
material and thereby avoid the dis- 
cussion of their current interpersonal 
problems), there is considerable evi- 
dence that the therapist exercises 
personal control over his patients. 
A brief examination of interview 
protocols of patients treated by thera- 
pists representing differing theoretical 
orientations, clearly reveals that the 
patients have been thoroughly condi- 
tioned in their therapists’ idiosy®- 
cratic languages.  Client-center' 
patients, for example, tend to produce 
the client-centered terminology, the 
ory, and goals, and their interview 
content shows little or no overlap 
with that of patients seen in psych 
analysis who, in turn, tend to speak 
the language of psychoanalytic the 
ory (Heine, 1950). Even more direct 
evidence of the therapists’ controlling 
influence is provided in studies ° 
patient-therapist interactions (Bat 
dura et al., 1960; Murray, 1956; 
Rogers, 1960). The results of these 
studies show that the therapist 7° 
only controls the patient by rewar® — 
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ing him with interest and approval 
when the patient behaves in a fashion 
the therapist desires, but that he also 
controls through punishment, in the 
form of mild disapproval and with- 
drawal of interest, when the patient 
behaves in ways that are threatening 
to the therapist or run counter to his 
goals. 

One difficulty in understanding 
the changes that occur in the course 
of psychotherapy is that the inde- 
pendent variable, i.e., the therapist’s 
behavior, is often vaguely or only 
partially defined. In an effort to 
minimize or to deny the therapist’s 
directive influence on the patient, 
the therapist is typically depicted as 
a “catalyst” who, in some mysterious 
Way, sets free positive adjustive pat- 
terns of behavior or similar outcomes 
usually described in very general and 
highly socially desirable terms. 

It has been suggested, in the ma- 
terial presented in the preceding 
Sections, that many of the changes 
aS occur in psychotherapy derive 
fom the unwitting application of 
a known principles of learning. 
ete occurrence of the neces- 
r ty conditions for learning is more by 

ident than by intent and, per- 
a a more deliberate application of 
s3 eal of the learning process 
A psyc. otherapy would yield far 

ore effective results. 

Sia predominant approach in the 
rib ae of psychotherapeutic 
af edures has been the ‘‘school’’ 
“Sota A similar trend is noted in 
ce laa methods being derived 
example oe theory. Wolpe, for 
inte as selected the principle of 
toning and built a 
bila SER around it; 
| extinction Miller have focused on 
a BE ae discrimination learn- 
bimost e followers of Skinner rely 
Ward ashes on methods of re- 

: is stress on a few learning 
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principles at the expense of neglect- 
ing other relevant ones will serve 
only to limit the effectiveness of 
psychotherapy. 

A second factor that may account 
for the discontinuity between general 
psychology and _ psychotherapeutic 
practice is that the model of person- 
ality to which most therapists sub- 
scribe is somewhat dissonant with 
the currently developing principles 
of behavior. 

In their formulations of personal- 
ity functioning, psychotherapists are 
inclined to appeal to a variety of 
inner explanatory processes. In con- 
trast, learning theorists view the 
organism as a far more mechanistic 
and simpler system, and consequently 
their formulations tend to be ex- 
pressed for the most part in terms of 
antecedent-consequent relationships 
without reference to inner states. 


Symptoms are learned S-R connections; 
once they are extinguished or deconditioned 
treatment is complete. Such treatment is 
based exclusively on present factors; like 
Lewin's theory, this one is a-historical. Non- 
verbal methods are favored over verbal ones, 
although a minor place is reserved for verbal 
methods of extinction and reconditioning. 
Concern is with function, not with content. 
The main difference between the two theories 
arises over the question of “symptomatic” 
treatment. According to orthodox theory, 
this is useless unless the underlying complexes 
are attacked. According to the present 
theory, there is no evidence for these putative 
complexes, and symptomatic treatment 1S all 
that is required (Eysenck, 1957, pp. 267-268). 
(Quoted by permission of Frederick A. Praeger, 


Inc.) 


Changes in behavior brought about 
through such methods as counter- 
conditioning are apt to be viewed by 
the “dynamically-oriented”’ thera- 
pist, as being not only superficial, 
“symptomatic” treatment, in that 
the basic underlying instigators of 
the behavior remain unchanged, but 
also potentially dangerous, since the 
direct elimination of a symptom may 


156 


precipitate more seriously disturbed 
behavior. 

This expectation receives little 
support from the generally favorable 
outcomes reported in the studies re- 
viewed in this paper. In most cases 
where follow-up data were available 
to assess the long-term effects of the 
therapy, the patients, many of whom 
had been treated by conventional 
methods with little benefit, had evi- 
dently become considerably more 
effective in their social, vocational, 
and psychosexual adjustment. On 
the whole the evidence, while open to 
error, suggests that no matter what 
the origin of the maladaptive behav- 
ior may be, a change in behavior 
brought about through learning pro- 
cedures may be all that is necessary 
for the alleviation of most forms of 
emotional disorders. 

As Mowrer (1950) very aptly 
points out, the “symptom-underly- 
ing cause” formulation may repre- 
sent inappropriate medical analogiz- 
ing. Whether or not a given behavior 
will be considered normal or a sym- 
tom of an underlying disturbance will 
depend on whether or not somebody 
objects to the behavior. For exam- 
ple, aggressiveness on the part of 
children may be encouraged and con- 
sidered a sign of healthy develop- 
ment by the parents, while the same 
behavior is viewed by school au- 
thorities and society as a symptom of 
a personality disorder (Bandura & 
Walters, 1959). Furthermore, be- 
havior considered to be normal at one 
stage in development may be re- 
regarded as a “symptom of a per- 
sonality disturbance” at a later pe- 
riod. In this connection it is very 
appropriate to repeat Mowrer’s (1950) 
query: ‘‘And when does persisting 
behavior of this kind suddenly cease 
to be normal and become a symp- 
tom” (p. 474). 

Thus, while a high fever is generally 
considered a sign of an underlying 
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disease process regardless of when or 
where it occurs, whether a specific be- 
havior will be viewed as normal or as 
a symptom of an underlying pathol- 
ogy is not independent of who makes 
the judgement, the social context in 
which the behavior occurs, the age of 
the person, as well as many other fac 
tors. 

Another important difference be 
tween physical pathology and be 
havior pathology usually overlooked 
is that, in the case of most behavior 
disorders, it is not the underlying 
motivations that need to be altered 
or removed, but rather th< ways in 
which the patient has i əzned to 
gratify his needs (Rotter, 1954). 
Thus, for example, if a patient dis 
plays deviant sexual behavior, the 
goal is not the removal of the under- 
lying causes, i.e., sexual motivation, 
but rather the substitution of more 
socially approved instrumental and 
goal responses. ; 

It might also be mentioned iñ 
passing, that, in the currently popu- 
lar forms of psychotherapy, the role 
assumed by the therapist may bring 
him a good many direct or fantasied 
personal gratifications. In the course 
of treatment the patient may expres 
considerable affection and admiration 
for the therapist, he may assign the 
therapist an omniscient status, am 
the reconstruction of the patients 
history may be an_ intellectually 
stimulating activity. On the other 
hand, the methods derived from 
learning theory place the therapist 1 
a less glamorous role, and this in it 
self may create some reluctance 0 
the part of psychotherapists to part 
with the procedures currently in us® 

Which of the two conceptual theo- 
ries of personality—the psychod¥- 
namic or the social learning theory 
is the more useful in generating effec- 
tive procedures for the modification 
of human behavior remains tO 
demonstrated. While it is possible t° 
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present logical arguments and im- 
pressive clinical evidence for the ef- 
ficiency of either approach, the best 
proving ground is the laboratory. 

In evaluating psychotherapeutic 
methods, the common practice is to 
compare changes in a treated group 
with those of a nontreated control 
group. One drawback of this ap- 
proach is that, while it answers the 
question as to whether or not a par- 
ticular treatment is more effective 
than no intervention in producing 
changes along specific dimensions for 
certain classes of patients, it does not 
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provide evidence concerning the rela- 
tive effectiveness of alternative forms 
of psychotherapy. 

It would be far more informative 
if, in future psychotherapy research, 
radically different forms of treat- 
ment were compared (King & Armi- 
tage, 1958; Rogers, 1959), since this 
approach would lead to a more rapid 
discarding of those of our cherished 
psychotherapeutic rituals that prove 
to be ineffective in, or even a handi- 
cap to, the successful treatment of 
emotional disorders. 
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Exploration of personality by mul- 
tivariate experimental methods, as a 
means of objectively determining 
personality structure, ‘has revealed, 
on the one hand, an array of stable, 
meaningful, cross-checking structures 
(Cattell, 1946, 1957; French, 1953), 
and on the other, some baffling in- 
consistencies. The latter have re- 
cently been pointed out by Becker 
(1960), apparently in criticism of the 
present writer’s personality theory, 
but have been known for several 
years, and were, in fact, first brought 
to light by Cattell and Saunders 
(1950). Nevertheless, Becker does a 
service to advertise these facts; for 
psychologists have greatly neglected 
the solution of the problems revealed 
in this field. 

The present writer’s theoretical 
position is that it is conceptually cor- 
rect to speak of the same unique 
source trait, e.g., cyclothymia-schizo- 
thymia, anxiety, ego-strength, sur- 
gency-desurgency, as something ex- 
pressing itself (in terms of recogniz- 
able, replicable factor patterns) 
across all three possible media of 
experimental observation. That is to 
say, the same influence should appear 
in L data (life record, behavior in 
situ), Q data (questionnaire, consult- 
ing room, verbal self-evaluation), and 
T data (objective, laboratory, minia- 
ture situational, non-self-evaluative 
test performances). 

In the article (Becker, 1960) to 
which I reply the fact that the actual 
correlation between the L-data and 
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Q-data estimates of what are ap- 
parently equivalents in the two 
media, sometimes falls far short of 
perfection, is accepted as disproof of 
this theory. This theoretical con- 
clusion is unsubtle; and the thesis of 
my reply is that countless threads of 
evidence contribute to the view that 
the same abstract personality source 
trait commonly operates across differ- 
ent media. However, certain “per- 
turbations” have to be recognized 
which prevent the simple relation 
appearing on the surface, and these 
need to be taken into account in un- 
derstanding psychological measure- 
ment generally. j 

In this area of scientific investiga- 
tion, Becker has not asked the right 
question. Unexpected, but system- 
atically evaluated perturbations of 
existing laws have often led to new 
discoveries, not so much by rejecting 
a law as by extending it, e.g., 1 
astronomy in the discovery of Nep- 
tune through observed perturbations 
in the expected orbit of Uranus. 50 
here, it is argued that there is no 
reason to abandon the notion of unt 
tary source traits (Cattell, 1946) but 
that one must recognize certain new 
concepts, which we have introduced 
under the terms situational, instru- 
ment, and refraction factors. These 
are supported partly by marshaling 
existing evidence, but also by experi- 
ments undertaken ad hoc, but which, 
through an editorial veto on space t0 
reply, have been reported in a sepa” 
rate publication (Cattell, 1960). 
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THE DEFINITION OF INSTRUMENT 
FACTORS 

The first and major source of 
perturbation in transmedia factor 
matching arises from what may be 
called instrument factors. Apparently, 
the first explicit recognition and 
demonstration of an instrument fac- 
tor occurred in a structural analysis 
of a very widely selected set of objec- 
tive personality tests, by Cattell and 
Gruen (1955), where a factor ap- 
peared literally produced by diurnal 
variations of sensitivity of a brass 
instrument (GSR). This purely in- 
strumental influence created a fac- 
tor by throwing common variance 
into all types of personality measures 
in which it was used. Such factors 
have appeared since in publications 
by Holzmann and Bitterman (1956), 
F. L. Damarin, D. T. Campbell, and 
L. Berwyn (unpublished), and several 
other unpublished studies known to 
the writer. Indeed, wherever ques- 
tionnaire variables are mixed with 
ratings, attitude scales with question- 
naires, or, sometimes, even one type 
of answer form with another, one or 
more factors may generally be found 
covering all variables having formal 
similarity. 

The difficulty factors of Wherry 
and Gaylord (1944), and Dingman 
(1958), should definitely be regarded 
as a subspecies of instrument factor. 
Recently, in a study of the Music 
Preference Test of Personality (Cat- 
tell & Anderson, 1953) by Mayeske 
(1961) an instrument factor appeared 
even separating all items resting on 
one form of musical recording from 
those based on another technique. 
Instrument factors have become bet- 
ter understood in the last couple of 
years through extensive studies of 
their appearance in objective moti- 
vation structure analyses (Cattell, 
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Radcliffe, & Sweney, 1960; R. B. 
Cattell & J. Horn, unpublished). 
There they appear as “vehicle fac- 
tors” covering all objective devices 
using the same vehicle, e.g., informa- 
tion, autism, for the objective meas- 
urement of motivation strength. In 
this, and many similar contexts, it 
has been shown that instrument 
factors can be fairly clearly elimi- 
nated by ipsative scoring (R. B. 
Cattell & J. Horn, unpublished, see 
Table 1). 

Before proceeding beyond this 
introduction by illustrations, to a 
more comprehensive definition of the 
concept of instrument factor, it is 
desirable, however, to make clear 
which peripheral factors are not to be 
included. This can be done most 
compactly by Figure 1, presenting a 
hierarchy which will be clear to 
multivariate experimentalists. Inci- 
dentally, the term “artifactors” is 
due to Roberts (1959), and has been 
sharpened by additional conditions 
here to make their separation from 
instrument factors cleaner. 

The justification for the labels of 
the three forms of “perturbing” fac- 
tors reproducible across experiments 
(matrices) will be given as we pro- 
ceed. Concentrating first on instru- 
ment factors, let us note that they 
are definable, initially, only in terms 
of intention and perspective. Later, 
the definition can be made more 
satisfactory as We develop precise 
concepts indicating various universes 
of variables. For a quality which 
persists across the differences of con- 
tent of a series of opinionnaires of 
similar form, and which perhaps con- 
sists of response to a particular form 
inherent in this instrument, though 
irrelevant to the content interest of 
the experimenter may yet represent 
behavior dependent on a real per- 
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i bing F: General Behaviog 

Error Factors a aN Predictive Factors 

Experiment Computation Observer Perception  Form-Specific 

“Incidental” Score-Dependence and Projection Instrument 

Factors Factors Factors Factors 


Common General 
Stimulus Situation 
Factors 


Common Response- 
Observation Form 
Factors 


Fic. 1. The place of instrument factors in a taxonomy of factors. 


sonality trait. For example, what 
comes as an instrument factor cover- 
ing the variables of similar form, 
ai...a,, may well load (when 
arı... an are condensed to a single 
variable a, set in the new context of 
variables b, c, d, etc.) some important 
general personality factor.! 

There is thus a sense in which an 
instrument factor is a matter of per- 
spective, i.e., of one’s starting point 


1 Incidentally, it is the failure to recognize 
this perspective which, in the present writer's 
opinion, has made so much recent work on 
response sets a rather uneconomical use of 
psychological research time. Whereas educa- 
tional psychometrists during the late 1950s 
“discovered,” in their opionnaire tests, re- 
sponse sets (Cronbach, 1950), social desira- 
bility sets (Edwards, 1957), extremity of 
response sets (Berg, 1955), and acquiscence— 
tendency to agree, yes-vs.-no (Messick & 
Jackson, 1961)—these had already been em- 
ployed by designers of objective personality 
tests in the late 1940s and early 1950s 
(Cattell, 1946; Cattell & Gruen, 1955). In the 
context of broader personality theories, and 
varied behavioral measures involved, it had 
already become clear that what itemetrists, 
without knowledge of the literature in this 
area, later treated merely as “flaws” in their 
paper-and-pencil tests, were actually expres- 
sions of well defined personality factors, e. g., 
anxiety or UI 24, comention or UI 20, super- 
ego rigidity or UI 29, as well as UI 31 (Cat- 
tell, 1957). 


and of the plane of experience from 
which one chooses the majority of 
one’s tests. In this sense, just as dirt 
is only ‘‘matter in the wrong place,” 
so an instrument factor is only “‘vari- 
ance where we didn’t expect it or 
don’t want it.” When we are measur- 
ing personality by questionnaire we 
obviously do not want each and all of 
the diverse personality dimensions 
included to be contaminated by what 
might be called a “generalized spe- 
cific,” i.e., a specific to questionnaires. 
And the fact that that specific may, 
indeed, be something more than 4 
trivial specific, but an expression of a 
single important personality factor 
spread over and contaminating all 
the alleged diverse personality meas- 
ures, does not make the measurement 
harsh any more acceptable! 

When more progress has been made 
toward a systematic taxonomy of 
tests, on some such objective basis as 
that worked out by Cattell and 
Warburton (in press), it would be- 
come possible to set up also a rela- 
tively objective classification of in- 
strument factors, according to the 
types of personality approach tO 
which they are tangential. For 
“form” and “content” are quite 
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subjective categories, and, in any 
case, by no means exhaust the pos- 
sible planes of experiment to which 
instrument factors can be orthogon- 
ally intrusive. For the time being, 
however, we must take a relativistic 
position, and one centered in ‘‘con- 
tent.” On this basis we shall con- 
tingently define an instrument factor 
as any uniquely (simple structure) 
rotated factor which covers a whole 
set of diverse variables having formal 
resemblance in presentation, mode of 
permitted response, or scoring, and 
which does not extend to tests of the 
same psychological content when 
couched in other modes of formal 
presentation, response, etc. 


THEORY OF SOURCES OF PERTUR- 
BATION AFFECTING TRAIT 
ALIGNMENT 


It should be noted that there are 
two distinct, though related senses 
in which a source trait can be said to 
be the same or not the same in two 
different media: 

1. An estimate of the factor from 
the variables in one medium may 
correlate less than unity with its 
estimate from variables in another 
medium, even when attenuation-cor- 
rected for (a) unreliability of meas- 
urement, and (b) imperfection of 
estimate. 

2. It may not be possible to dis- 
cover a trait, when factoring both 
media together, which has simple 
structure across both media and also 
possession of the hypothesized, simi- 
lar-meaning salient loadings in both 
media. (Whether one also means 
that the simple structure position in 
one medium will not project into the 
other we shall discuss below.) 

Becker has been concerned with 
the first of these, denying alignment 
without first checking that Correc- 
tions a and b could not restore the 


correlation to unity. In any case, the 
second meaning is more important. 
If unity in this sense holds, personal- 
ity theory is profoundly simplified, 
and it is only a matter of the mechan- 
ics of statistics to produce weighted 
measures from the two media that 
will approach a correlation of unity. 

In the larger collation of data and 
new experimental work (Cattell, 
1960), from which the present article 
abstracts, it has been shown that the 
presence of unrecognized instrument 
factors in the two media will prevent 
alignment either in Sense 1 or 2, un- 
less special new techniques are used. 
Before devoting a section to closer 
inspection of this result, however, it 
is desirable to set out a clear theory 
about more general sources of per- 
turbation. For, in principle, one can 
see that there are some six possible 
origins of the failure to find a one-to- 
one alignment of primary personality 
factors measured in one medium with 
those measured in another. Some of 
these will produce instrument factors; 
others will contribute to other kinds 
of nonalignment to be described. 


Sources of Nonalignment 


Human transmission (perception, 
evaluation, projection, memory) of 
score values. Largely this means rating 
and self-rating (L and Q data). This 
is too subtle and complex a field— 
hitherto handled too simply in terms 


2 Such a procedure should be sharply dis- 
tinguished from what Becker (1960) appears 
to advocate, and describes Gough as doing, 
namely, to force a Q scale to align itself with 
an L factor by assiduous item selection. Any 
such procedure contributes nothing to our 
knowledge of structure, but only hides the 
problem. If it succeeds, and if our theory is 
correct that L-data factors are the most 
heavily contaminated of any medium with 
irrelevant factors, this is forcing a poorly 
oriented measure to agree with a still poorer 


one. 
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of “‘halo’’—for the present abstract 
summary to be illustrated in available 
space (see Cattell, 1960). Theoreti- 
cally, the pattern of correlations, and 
therefore of obtained factors, could be 
distorted by, and only by, properties 
of the individual and his relation to 
the recorder which affect the record- 
ing of all his behavior variables, and 
by properties of the perceiving re- 
corder. The former can be divided 
into (a) value relationships, of which 
liking-disliking (a constituent in halo) 
is only one; and (b) perspicacity or 
visibility effects, e.g., extraversion 
making the ratee more known, posi- 
tion effects making certain behaviors 
more clear. The latter can be divided 
into projections of (a) stereotypes or 
cultural clichés, and (b) refraction 
factors, discussed below, peculiar to 
one medium. In all the “perceiving 
recorder effects” a correlation is pro- 
duced by “projection” of a (perhaps 
quite unconscious) conviction that 
certain variables go together. Some 
of these may produce typical instru- 
ment factors, uniformly and about 
equally loading all variables in the 
medium; but others may load only 
some variables, producing what are 
perhaps best described as “percep- 
tion-evaluation” projection factors, 
and which are not true instrument 
factors. 

Communality of variables in respect 
to some trait required for handling a 
similar formal performance in all of 
them (or for registering in an observa- 
tion situation). This is essentially 
one of the two main sources (see fol- 
lowing paragraph) of instrument 
factors only. The countless possibili- 


3 Since sociologists have ruined “stereo- 
type,” by applying it equally to a widespread 
concept which either (a) does or (b) does not, 
correspond to statistical reality, I suggest 
“cultural cliché” explicitly for a widespread 
cultural concept which is significantly differ- 
ent from any externally existing pattern. 
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ties may be illustrated by e.g. the use 
of 30 scales in all of which the score 
(in one direction or the other) de- 
pends on an ability to read, or on in- 
formation or skill of expression, or 
tendency to say yes rather than no, 
etc. 

Communality of variables in respect 
to scoring or scaling applied after ad- 
ministration. Quite apart from com- 
mon demands on the subject's actual 
performance as in the previous para- 
graph, anything in the formal scoring 
procedure which tends to give similar 
sigmas, and skewedness (and in some 
cases means) throughout one class of 
tests will tend to create higher corre- 
lation among them and a common 
factor. That is to say, if the matrix 
of correlations of tests a, through an 
were just the same, on arank formula, 
as that for bı through ba, but if all 
the a’s, on the one hand, and all 
b’s, on the other, have similar dis- 
tribution, then basing the matrix 
afresh on a product-moment formula 
will tend to give an instrument factor 
for the a’s and/or the b’s separately. 

Coincidence of different global stimu- 
lus situations with different test media 
administrations. If a person answered 
one set of questionnaires in private, 
and another orally and publicly 
(which is akin to the interview or be- 
havior rating situation), we should 
expect real differences in response 
due to the actual stimulus situation, 
covering the occasion on which all 
items of one test were answered, be- 
ing different from that covering the 
other test-taking setting. A priori 
this could create both an instrument 
factor, conterminous with each me- 
dium-situation, and also a change in 
loading of the same items on the 
same personality factors in the two 
situations. i 

Habitual broad area differences in 
actual trait development and expres- 
sion. Among children, for example, 
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we should expect the particular be- 
havior variables representing, say, 
the dominance factor, to be expressed 
to different degrees in the home en- 
vironment and in the school environ- 
ment. This is analogous to the point 
in the above paragraph, except that 
the influence is expressly conceived 
not to lie in the temporary measure- 
ment situation itself, but in the pro- 
longed life situation, leading to real 
differences of actual habit strength, 
i.e., of the trait itself. Factor ana- 
lytically, this might produce a home 
dominance factor and a school domi- 
nance factor, representing the relative 
impact of home and school, respec- 
tively, or alternatively, one factor 
modified by two other factors, each 
peculiar to one broad area. If the 
former proves to be more character- 
istic, then we can confidently predict 
that the two first-order factors will 
correlate highly and yield a single 
second-order dominance factor. Even 
if the former is true it would be pos- 
sible, in a rough factoring to perceive 
the structure as that of a home and 
school instrument factor (as in the sec- 
ond possibility) but psychologically, 
the interpretation, if the proper struc- 
ture is obtained, would now be differ- 
ent from an instrument factor effect. 
The area differences would then be 
interpreted as real structure differ- 
ences, and the concept of a single 
dominance trait would be discovered 
and justified only at the second-order 
factor level. 

Differences among media in density 
of representation of variables. If in 
sampling variables in the ability field 
an experimenter accidentally took 
one variable for each of Thurstone’s 
primary abilities and factored, he 
would obtain, straightaway, i.e., as 
a first-order factor, that general abil- 
ity factor which, in any ‘‘dense’’ rep- 
resentation of variables, appears only 
as a second-order factor (Thurstone, 


1938). This concept of density of 
variable representation has been de- 
veloped further elsewhere (Cattell, 
1957, pp. 808-817), but it is easy to 
see that if there were really large dif- 
ferences of density unrecognized be- 
tween media we should obtain no 
correlational alignment of the pri- 
maries in the two fields. Only on ex- 
ploring the second order would the 
possibility arise of discovering that 
a second order in one medium is the 
same as a first order in the other. 

Actually, as soon as systematic ex- 
ploration of second-order structure 
in questionnaires reached to six 
factors (Cattell, 1957; Cattell & 
Scheier, 1961; Cattell & Warburton, 
in press), it became evident that 
four second-order questionnaire fac- 
tors aligned with four first-order ob- 
jective test factors (UI 19, 20, 24, 
and 32); and in two of these, UI 24 
(anxiety) and UI 32 (extraversion), 
the agreement is perfect within small 
limits of experimental error. An in- 
stance from a different realm, but 
amounting to a correlation of only 
0.80 between the two media, exists in 
Tollefson’s demonstration (1961) that 
the second-order extraversion factor 
in the questionnaire is a first-order 
factor in the Humor Test of Person- 
ality. These alignments (from the ear- 
lier, 1954-1957, publications above) 
are not mentioned in Becker’s article 
(1960), perhaps because his com- 
ments are all on L- and Q- (rather 
than T-) data alignments. But the 
findings are highly relevant as show- 
ing that there does exist a corner of 
the intermedia jigsaw puzzle which 
is beginning to fit in place. These 
five experimental instances alone are 
surely sufficient to encourage us in 
that rejection of nihilism which this 
article undertakes. 

To risk a prediction in the little 
explored field of “density,” one 
might judge that variables in Q data 
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will prove somewhat more ‘‘dense” 
than L data. But substantially, as 
the above evidence shows, one can 
conclude only that variables as com- 
monly chosen are much more dense 
in Q than T data. This is under- 
standable; e.g., in the T-data anxiety 
factor, we test startle response by a 
single cold pressor test (Cattell & 
Scheier, 1961) whereas in most anxi- 
ety questionnaires there are a dozen 
items asking in different ways how 
easily the person startles. Cronbach 
(1960), Comrey, and others who 
criticize low homogeneity when re- 
viewing factor scales, are perhaps un- 
wittingly driving their flocks toward 
the more serious danger of using 
personality scales heavily loaded in 
spurious “specific”? variance of this 
latter kind, instead of watching that 
their scales deal with personality 
factors having broad psychological 
relevance and effectiveness. 

If the above search for sources of 
perturbation has been truly exhaus- 
tive, our summary must include 
three other forms of distortion be- 
sides instrument factors, constituting 
four in all, as follows (beginning with 
instrument factors): 

1. Test instrument factors, includ- 
ing common test form (response- 
observation-score) factors, and com- 
mon test general stimulus situation 
factors. 

2. Modification of actual trait by 
influences peculiar to one area of ex- 
pression, producing primaries for 
each area and requiring conceptual 
unity to be sought at a higher order 
level. 

3. Difference of density of repre- 
sentation of variables, as commonly 
unconsciously chosen by experimen- 
ters, in their different media, result- 

ing in a higher order in one medium 
matching a lower order in another. 

4. Perception-evaluation or pro- 
jection factors, which trespass on the 
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variance of the variables used to esti- 
mate personality factors, not by uni- 
formly loading all in one medium (as 
does an instrument factor) but hav- 
ing each a characteristic form, and, 
when restricted to one medium, hav- 
ing the properties of refraction fac- 
tors described below. 


THE PRACTICAL PROBLEM OF REACH- 
ING PERSONALITY STRUCTURE 
DESPITE DISTORTIONS 


If the above theoretical analysis is 
correct the manifest correlational 
picture of personality structures will 
be less like Whistler’s portrait of his 
mother than the cubist’s rendering 
of the same, fractured into surpris- 
ing new supernumerary planes and 
facets. To translate from the latter 
to the former, it is necessary that 
research, first, check the hypotheses 
about the forms of distortion at work 
and, second, find experimental and 
statistical means for isolating and 
setting aside these various perturbing 
influences. 

One cannot do more than glance 
at these tasks here. As to the first, 
our initial examination of data shows 
definitely that form-specific instru- 
ment factors exist, while my col- 
leagues and I have also begun to 
give evidence for the Sources 2, 3, 
and 4. The source of nonalignment 
labeled 2—local area modification of 
real traits—has been more fully 
illustrated elsewhere (Cattell, 1960) 
but must be left to others systemati- 
cally to investigate. Source 3, 
changing density with changing me- 
dium, has already been substanti- 
ated. i 

As to the second task—segregating 
the distorting influences to arrive at 
essential structure—the unraveling 
of Effects 2 and 3 above is straight 
forward, by second-order factoring: 
though the possibility has bee? 
mooted above that Source 2 could 
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produce two instrument factors, be- 
yond a single first-order factor, in- 
stead of two first orders resolving 
into a second. 

Setting 2 and 3 aside, therefore, 
we shall devote the present section to 
unraveling the effect of instrument 
factors, 1 above, and the following 
section to perception-evaluation-proj- 
ect phenomena, 4 above. 

The special experiments with in- 
strument factors described elsewhere 
(Cattell, 1960) proceeded first to find 
what happens when one factors cor- 
relation matrices derived from known, 
numerically stated factor models, 
and secondly, to experiment with 
varieties of solution in actual psy- 
chological data where the existence 
and boundaries of an instrument 
factor were well known beforehand. 
These experiments showed that: 

1. Where the instrument factor 
covers all variables, i.e., where they 
are not embedded in a larger matrix, 
with other media to constitute a hy- 
perplane and determine unique rota- 
tion, the typical investigator and pro- 
cedure will not find or be aware of the 
instrument factor. 

2. If the instrument factor is not 
found then either: (a) the correla- 
tions among the primaries will be dis- 
torted (if it is positive on all and they 
are all positively correlated, it will 
increase their correlations); or, (0) 
the simple structure which really ex- 
ists among the primaries will not be 
found, or found only in very im- 
Paired form. Commonly b will pre- 
dominate, but both will operate. 

After this demonstration of the 
effect of an instrument factor in a 
single medium we proceeded to 
models and real instances containing 
blocks of variables uniformly from 
each of two or three media. Herein 
each medium was covered by one 
instrument factor but where true per- 
sonality factors existed in the sense of 


167 


having a simple structure position 
with salient loadings on variables of 
similar meanings in both media. Here 
it was shown: 

1. If one obtains the best possible 
simple structure (perhaps imperfect 
because of mixed-in instrument fac- 
tor) among variables separately in 
each medium, the same simple struc- 
tures cannot usually be found when 
the media are put together. 

2. One reason for this is that if one 
projects the simple structure posi- 
tion satisfactorily obtained in one 
medium into the second,* it definitely 
does not give simple structure within 
the second. 

3. If, however, one first admits the 
existence of, and locates by simple 
structure in the combined matrix, the 
instrument factors (which can now 
have determinate hyperplanes), then 
the true personality factors, operat- 
ing across both media, can be located 
(in blind simple structure rotation). 
A successful example of this in real 
data—objective motivation measure- 
ment (R. B. Cattell & J. Horn, un- 
published)—is shown in Table 1 
here, and in other models elsewhere 
(Cattell, 1960). Our ignorance of 
this principle in 1948 was presum- 
ably responsible for the chaotic out- 
come of the first extensive transme- 
dium factor analyses (Cattell & Saun- 
ders, 1950, 1955). 

Incidentally, it will be obvious 
that missing the instrument factor, 
failing to rotate it correctly if one 
does not miss it, and encountering 
the subsequent distortion are due 
respectively to (a) the lack of a test 
for factor extraction that will decide, 


4 This cannot be done, of course, simply by 
applying the same discovered transformation 
(à) matrix to the centroids, because the latter 
begin at different positions. One first dis- 
covers by the Procrustes program the A most 
ducing the first medium simple 


nearly repro' he X p 
structure from the joint medium centroid. 
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TABLE 1 


PSYCHOLOGICAL AND INSTRUMENT Factors AS FOUND IN OBJECTIVE, 
Dynamic TRAIT SIMPLE STRUCTURE 


Attitude Variable and Device Measurement 


al 


1 Desire for good self-control. Information measure 
2 Wish to know oneself. Information measure 


3 Wish to never to become insane. Information measure 

4 Readiness to turn to parents for help. Information measure 

5 Feeling proud of one’s parents. Information measure 

6® Desire to avoid fatal disease and accidents. Information measure 
bomb. Information measure 


7* Wish to get protection from A 
8 Desire for good self-control. Autism measure 
9 Wish to know oneself. Autism measure 

10 Wish never to become insane. Autism measure 


11 Readiness to turn to parents for help. Autism measure 


12_ Feeling proud of one’s parents. Autism measure 


13% Desire to avoid fatal disease and accidents. Autism measure 
14° Wish to get protection from A bomb. Autism measure 


Factor Matrix 


Psychological Factors | Instrument Factors 


Senti- Infor- P 
Escape ment Self- | mation Aun 
Tg to Senti- | Device Factor 

Parents ment Factor = 

—02 26 03 

—05 31 19 

12 04 

H 09 —01 

—01 01 

04 13 —02 

—08 03 —05 

—04 30 

07 37 31 

-0 25 

i 09 42 

01 14 

20 01 17 

13 09 10 


Note.—The theoretically required salients to define the factors are boxed in, and except for two values at the 
bottom of the parental sentiment factor column, the salients are high (above .09) where, and only where, they are 


theoretically required to be. 


® Attitudes 13 and 14 are the same as 6 and 7, but in a different medium, and similarly, for the other cross-media 


personality factors. 


to within less than an error of two or 
three factors, how many should be 
extracted; (b) having no variables 
from other media to give a hyper- 
plane for it; and (c) the variance 
that should have been in the instru- 
ment factor being pushed into the 
personality factors, destroying the 
clarity of their hyperplanes. The 
remedy which worked in the above 
cases was to give good technical at- 
tention to these issues. 


ON ISOLATING TRANSMEDIUM 
PERSONALITY FACTORS AND 
REFRACTION FACTORS 


Our final step consisted in return- 
ing to the actual L and Q data from 
which Becker infers that personality 
factors are unmatchable across me- 
dia, and showing that when exam- 
ined by more penetrating concepts, 
as above, uniquely determinate, psy- 
chologically meaningful, factor pat- 
terns appear, expressing themselves 
appropriately in both media for each 
factor. This has theoretical interest 
in giving additional substance to 
Point 3 above, by introducing the no- 


tion of refraction factors, and in pro- 
ducing some order in that L-Q fron- 
tier which has hitherto been the most 
hopelessly obscure of the transmedia 
relationships. Nevertheless, this ap- 
proach does no more than reveal 
some order, and at the same time 
opens the door on a lot of problems, 
particularly in the field of behavior 
rating, which will now demand sys- 
tematic investigations. 

It is not easy to find in any pub- 
lished study of the past 20 years 
(ever since personality structure re- 
search began in earnest) an experi- 
ment really adequate in reaching the 
technical conditions necessary _ tO 
get anywhere on this question. One 
needs, among other things, an exper! 
ment: (a) on a sufficient sample for 
sampling errors not to be intrusive; 
(b) where the subjects had a long 
testing period in which they were 
simultaneously rated in situ and sub- 
jected to questionnaires, comprehen- 
sive, reliable, and valid enough to 
define several factors clearly; (o) 
where ratings and questionnaire vati- 
ables were strategically chosen to 
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represent psychologically familiar 
factors, already vouched for by ear- 
lier researches; and (d) where ratings 
were carried out by peers and under 
the requisite conditions described 
elsewhere (Cattell, 1946, 1957). Prob- 
ably the most satisfactory data avail- 
able is that in which the experi- 
mental work was broadly conceived 
and painstakingly carried out by 
Coan, on 7.8-year-old children (Cat- 
tell & Coan, 1957, 1958). It suffers 
only with respect to d, in that ratings 
were made by teachers instead of 
peers, and perhaps in reduced homo- 
geneity of sample through equal in- 
clusion of boys and girls. 

Taking the data of this experiment 
we find that 24 rating variables have 
already been factored and blindly ro- 
tated into 12 very definite simple 
structure factors, each represented by 
two markers (see Table 5 in Cattell, 
1960). Similarly, 24 variables in Q 
data, each consisting of a scale of 
about eight items, have been re- 
solved as 12 well known simple struc- 
ture factors, each marked essentially 
by two salient variables. However, 
on psychological inspection of these 
resolutions, the hypothetical position 
was taken that only 10 of the 12 
factors were common to the two 
matrices, the remaining 4 being spe- 
cial, 2 to each matrix. 

The two sets of 24 variables were 
now combined and intercorrelated in 
a cross-medium, L-Q matrix of 48 
variables, which, by Tucker's test, 
yielded 16 factors. (With the hy- 
pothesis of matching, above, one 
would expect 14, but it is usual to 
find some new factor created by the 
mixture when two matrices are 
pooled.) The structure of this new 
factor space proved to be complex. 
Projection of simple structure ob- 
tained in one into the other, as de- 
scribed earlier (Footnote 4), would 
not yield a good combined simple 
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structure. Attempts to force simple 
structure by varimax, oblimax, or 
other ‘analytical’ programs failed 
because these rigid programs could 
not recognize and uniquely rotate 
the instrument factors, which, on the 
basis of the above principles and 
findings, we knew must be present. 
Only a patient and comprehensive 
exploratory visual rotation (aided by 
the photographic Rotoplot program 
on Illiac), over 22 rotations, yielded 
a position of such stability that one 
could repeatedly return to it. In 
reaching this position we found that 
the hyperplanes in the data were 
noticeably a little broader (about 
+.13 instead of +.10) than those 
existing in one medium alone. 

On examining the solution, set out 
in Table 2,5 we found that we had 
essentially an instrument factor for 
L data and another for Q data (not 
set out at the end of the matrix, but 
marked In; and Ing, in Table 2). 
There are also two other factors, 
which we would guess might be pro- 
jected “clichés,” numbered 13 and 
16. The interesting fact is that when 
this debris is set aside, patterns for 
the well known personality dimen- 
sions C (Ego strength), D (Excita- 
bility), F (Surgency), and H (Parmia), 
appear, with the appropriate four 
markers (2L and 2Q) on each, though 
the hyperplanes are pierced by one 
or two random appreciable loadings 
on other factors. (Counting within 
+.13 they reach acceptable percent- 


5 The matrix containing the correlations 
among factors, the lambda matrix, and the cen- 
troid for Table 2 have been deposited with the 
American Documentation Institute. Order 
Document No. 6570 from ADI Auxiliary Pub- 
lications Project, Photoduplication Service, Li- 
brary of Congress; Washington 25, D. C., re- 
mitting in advance $1.75 for microfilm or 
$2.50 for photocopies. Make checks payable 
to: Chief, Photoduplication Service, Library 


of Congress. 
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ages of 65, 73, 77, and 54 in the hy- 
perplane.) 

However, a hitherto undescribed 
phenomenon is encountered here, 
namely, the appearance of factors 
restricted to one medium, and ap- 
pearing in one or both of the separate 
media alongside, and simultaneous 
with, the appearance of the joint me- 
dium factor having the same person- 
ality meaning. This is illustrated by 
C and Cr, D and Dg (Table 2), 
wherein the real psychological factor 
(C or D), loading the four essential 
variables across both media, carries 
alongside it an incomplete image of 
itself in each medium. The incom- 
plete image loads only the two vari- 
ables which belong in one medium. 
To these patterns, occurring simul- 
taneously with the combined pat- 
tern, I have tentatively given the 
name “refraction factors,” since they 
are analogous to what would be seen 
if one looked at an object both di- 
rectly and refracted through a prism 
of another medium, one on each side 
of the line of vision. 

Actually Table 2 does not simul- 
taneously present all refraction fac- 
tors for all real factors, but this 
should not disturb us any more than 
the failure of a single archeological 
digging to provide all the bones of a 
skeleton or all cultural elements for 
a given period. For, as it has been 
argued elsewhere (Cattell, 1958) any 
matrix typically has strictly as many 
dimensions as variables, and prob- 
ably even more hyperplanes, i.e., one 
is always taking a selection in simple 
structure among more possible hy- 
perplanes than one has chosen to ex- 
tract factors. Further search should 
be made for refraction factors, there- 
fore. y 

A vital empirical question affecting 
further inference at this point con- 
cerns the correlations among the real 
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and refraction factors for a given 
psychological dimension. We had 
expected them to be positively cor- 
related, but the best estimate from 
existing data is that they are only 
slightly correlated, if at all. It is pos- 
sible, however, that if more dimen- 
sions had been taken out their cor- 
relations would have been increased 
(see Diagram 5, Cattell, 1958). 

Exploration and evaluation of pos- 
sible hypotheses to account for re- 
fraction factors would require at 
least an article to itself. One does not 
go too far in interpretation, however, 
to say that they imply that each indi- 
vidual, in addition to his assessment 
on the real factor, gets a ‘“‘bonus” 
on the variables peculiar to each 
medium, which is substantially un- 
related to his status on the real 
factor. Our hypothesis is that these 
refraction factors belong to the per- 
ceptual class (Class 4 on page 166 
above) and arise from the behavior 
in question being differently per- 
ceived in the two media. In self-rat- 
ing a varying sensitivity and self- 
awareness—only in special cases a 
function of the trait being rated— 
could provide the differing “bonus” 
from person to person. The differing 
visibilities of these individuals from 
the position of the rater, giving the 
L-data refraction, would be expected 
to be quite unrelated to the order of 
their individual sensitivities in self- 
rating. 

If this is correct one might also ex- 
pect the lesser loadings, on variables 
other than the two salients, to be 
systematically different on the two 
refraction factors. For example, the 
rating by others, in the case of a 
factor much concerned in delin- 
quency, might impart something of 
the stereotype of a scoundrel, where 
the Q-data refraction factor might 
convey more of a good person in diffi- 
culties. Since our main concern is 


with the order which emerges little 
has been said of the “debris” factors 
notably 13 and 16in Table 2. But our 
conclusion, tentatively, is that ‘‘eval- 
uative” and “visibility” factors other 
than refraction factors are at present 
run together in the insufficient factor 
space so far used, and that, especially 
in the L data, these “halo” and re- 
lated factors are substantial. They 
do not appear to be any known sec- 
ond-order factors, which can some- 
times appear in inadequate first- 
order factorings. It has sufficed for 
our present investigation simply to 
set them aside. But if closer re- 
search scrutiny in this heap shows 
that our present indications are cor- 
rect that these Class 4 perturbers are 
much larger in L than Q data, then 
the practice of trying to force ques- 
tionnaire factors to align with rating 
“criteria’’ comes still more in ques- 
tion than it is today. 

That the reader may more directly 
evaluate the nature and quality of 
the simple structure in Table 2 we 
have set out in Figure 2 a plot of two 
psychological (‘‘real’’) factors there- 
from. 


SUMMARY AND CONCLUSION 


1. Correlations among primary pet 
sonality factors in different media do 
not provide a simple pattern of one- 
to-one relations, and fall decidedly 
short of unity between two factors 
of the same apparent psychological 
meaning, 

2. The theoretical possibilities and 
the natural occurrences of perturbing 
influences hiding true alignment have 
been discussed and demonstrated. 
They have been classified as (a) test 
instrument factors; (b) actual trait 
modification by differing experience 
in subareas, requiring unity to be 
sought at a higher order level; (o) 
differences of density of representa 
tion of variables in different media 
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and (d) perceptual-evaluation-pro- 
jection factors, occurring where hu- 
man transmission of observations is 
inyolved. 

3. Experimenters, especially when 
leaving rotation decisions to falsely 
founded analytical computer pro- 
grams, commonly miss instrument 
factors, but when these are properly 
isolated and set aside by careful ex- 
ee it is possible to find the well 

nown primary personality factors, 
each appearing as a single factor ex- 
Pressing itself in both L and Q media. 
ne agord for instrument and sec- 
E a er—first-order factor relations 
ae ieee Producing „clarity and 
Se eos in personality structure 
oa ; but much remains to be ex- 
ae regarding at least four forms 
istortion which apparently occur 


where human transmission is in- 
volved, i.e., in L and Q data. The 
new phenomenon of refraction fac- 
tors particularly calls for intensive 
research. 

5. One must distinguish between 
the question “Does a single simple 
structure factor exist loading varia- 
bles of the same meaning on both 
media?” and “Can one get a perfect 
correlation between estimates of ap- 
parently (by meaning) the same 
factor, made in the two media?” Even 
when the answer to the first, so im- 
portant for personality theory, is 
“Yes,” as this paper claims to have 
shown, the answer to the second re- 
mains “No.” The variance due to 
instrument factors, refraction factors, 
and any evaluation-perceptual fac- 
tors peculiar to one medium will re- 
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main with and confound the estimate 
of a factor from that medium. Possi- 
bilities exist, by ipsative scoring and 
discriminant function methods of im- 
proving the correlation between esti- 
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been opened above toward a proper 
estimation of the correction for at- 
tenuation that can be applied to see 
if the correlation could’ be unity. 


But these developments await re-` 


mates of the same factor made in search. 
two different media, and a path has 
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COMMENTS ON CATTELL’S PAPER ON 
“PERTURBATIONS” IN PERSONALITY 
STRUCTURE RESEARCH 


WESLEY C. BECKER 
University of Illinois 


Cattell’s reply to my earlier paper 
(Becker, 1960) questioning the valid- 
ity of published statements of a one- 
to-one matching between L-data and 
Q-data factors concedes the inaccu- 
racy of those statements (see the first 
point in his summary). However, in 
the process of developing a defense 
for his basic theoretical position, Cat- 
tell has distorted the nature of my 
arguments to the point that a further 
brief clarification is needed. 

Cattell states several times in his 
paper (Cattell, 1961) that since the 
evidence did not support his theory, 
I concluded that the evidence dis- 
proved his theory. In rebuttal I 
need only quote two sentences from 
my earlier paper. 

It is apparent that the present evidence 


does not support the claim for ‘‘secure link- 
age” of BR and Q factors. This does not nec- 


essarily imply that future research using 
more reliable and factor pure measures may 
not still prove Cattell’s proposition to be cor- 
rect (p. 208). 


My critique was based on a ques- 
tion of fact, not of theory. Cattell 
has conceded this question of fact, 
as he must, but then he sets up for 
attack a question of theory which I 
did not raise. I did go on to indicate 
on logical grounds why I felt com- 
plete confirmation of his theory was 
exceedingly unlikely, and I see noth- 
ing in his present paper to change 
this opinion. The demonstration of a 
few “matchings” in the extraversion 
area, where on psychological grounds 
one would most expect self-percep- 
tions and behavior ratings to overlap, 
can hardly be accepted as firm evi- 
dence for his general theoretical posi- 
tion. 
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CATTELL REPLIES TO BECKER’S “COMMENTS” 


RAYMOND B. CATTELL 
University of Illinois 


In his additional comments, Becker 
expressly concedes that my theory 
has not been disproved. It is still 
odd that he objects to my saying that 
he considered the theory untrue, since 
he again says that it is “exceedingly 
unlikely,” and, to a scientist, ‘‘true”’ 
and “untrue” mean “highly prob- 
able” or “highly improbable, '—at 
least, since the time of Victorian 
physics. 

The positive conceptual and ex- 
perimental contributions of my paper 
appearing since his comments, he 
either misses or ignores, since they 
show: (a) that it was impossible for 
him to reach any intelligible conclu- 


176 


sion on the theory without recogniz- 
ing and developing the necessary 
corrections for attenuation and per- 
turbation, and (b) that the facts 
which he says I must and do recog- 
nize are those chosen by Becker from 
experiments with older techniques. 
Science moves on, and the new facts 
which I present from technically 
more advanced designs show that the 
same factor simultaneously loads on 
the hypothesized markers for both the 
rating and the questionnaire factors. 
His statement that I concede his facts 
is therefore ambiguous. 
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METHODOLOGY 


AND RESEARCH ON THE PROGNOSTIC 


USE OF PSYCHOLOGICAL TESTS 


SAMUEL C. FULKERSON anp JOHN R. BARRY! 
Western Psychiatric Institute and Clinic, University of Pittsburgh 


„There has not been a general re- 
view of the use of psychological tests 
in prognosis since Windle’s review in 
1952. At that time Windle concluded 
that, (a) it appeared to be some char- 
acteristic of the patient rather than 
the therapy given which determined 
the outcome of mental illness; (b) 
most studies in the area were difficult 
to interpret due to inadequate specifi- 
cation of one or more of the following: 
the sample characteristics, the treat- 
ment schedule, the criteria of im- 
provement, and the degree of control 
imposed on variables influencing out- 
come; (c) the necessary step of cross- 
validation was usually omitted; and 
(d) personality tests, including the 
projective tests, had shown little 
promise in predicting outcome. 

_ The purpose of the present article 
is to bring the review of the research 
on the prognostic use of tests up-to- 
date and to deal with some related 
methodological issues. The scope and 
organization will depart from that 
used by Windle. Firstly, the present 
Sita covers a wider range of cri- 
ae Windle considered primarily 
anim of predicting improve- 
as a owever there seems to be a 
ey ao of criteria which are closely 
x ed, logically and in practice, and 

articles have been included dealing 


1 
H pe a es is due the criticism and 
ions of Charles Windle and Joseph Zubin. 


with a variety of criteria other than 
improvement. Secondly, the organ- 
ization will differ from Windle’s. He 
centered his review around individual 
tests, taking each test in turn and 
citing all prognostic studies where it 
had been used. The present paper is 
organized around the predictive prob- 
lem rather than the individual test, 
since in practice the clinician wants 
to know how to come to a decision 
about a patient rather than what can 
be done with a given test. It is hoped 
that this emphasis will help to point 
up which questions are involved in 
the area of prognosis, and the relative 
attention each has received in re- 
search. And finally, the emphasis on 
decisions reflects an interest in deci- 
sion theory (Luce & Raiffa, 1957), 
which has recently been suggested 
(Cronbach & Gleser, 1957) as a 
promising frame of reference from 
which to regard psychodiagnostic 
testing. 

Windle included studies from as 
early as 1926 through 1951. The pres- 


ent review mainly covers the period 


from 1952 through June 1959. The 
coverage is more complete for those 
sections dealing explicitly with the 
prognostic use of tests than for the 
sections on methodological problems. 
Only the major psychological and 
psychiatric journals have been re- 
viewed exhaustively. 
177 
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METHODOLOGY 


The methodological difficulties in 
research on prognosis concern the re- 
searcher's decisions as to what sam- 
ples he will use, what selection in- 
struments he will apply to the sam- 
ple, and what criteria seem most 
appropriate. 


Sample Attributes 

One of the primary methodological 
difficulties has been the definition of 
the sample. Psychiatric diagnosis 
appears to have been the predomi- 
nant basis of sample definition in 
spite of the known unreliability of 
these categories. Attributes of the 
sample such as age, education, sex, or 
socioeconomic status are usually 
listed. However, little attention has 
been paid in most of the studies re- 
viewed to achieving homogeneous 
samples or subsamples. Some in- 
vestigators have worked with only 
one diagnostic group, mainly schizo- 
phrenics. Since schizophrenia is a 
diagnosis given to over 50% of un- 
specified functional psychotic dis- 
orders, the difference between the re- 
sults from such studies and those 
using psychotics sampled at random 
are hard to determine. 

The need for homogeneous samples 
is clearly pointed up by a considera- 
tion of the question of base rates. 
Meeh! and Rosen (1955) said “a 
psychometric device, to be efficient, 
must make possible a greater number 
of correct decisions than could be 
made in terms of the base rate alone” 
(p. 194). Studies of base rate as a 
function of diagnostic category 
(Langfeldt, 1956; Pascal, Swensen, 
Feldman, Cole, & Bayard, 1953; 
Rennie, 1953) indicate wide fluctua- 
tion in outcome between categories. 
Examination of these base rates indi- 
cates that a sample of psychotic pa- 
tients with a preponderance of manic- 
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depressives would have a higher base 
rate of improvement (approximately 
68%) than a sample consisting of 
schizophrenics (approximately 50%), 
A predictor, even though its actual 
validity was zero, could do much 
better predicting outcome in the 
first sample than in the second if the 
cutting point of the predictor was ad- 
justed to take advantage of the per- 
centage of improvement. The opti- 
mal chance percentage of correct pre- 
diction in the first sample (achieved 
by calling everyone improved) would 
be 68%, which is equal to the base 
rate, With a sample of any size this 
would differ significantly from 50%, 
the value likely to be designated as 
chance if one did not know the base 
rate. And if the relative effectiveness 
of a predictor in the two samples were 
tested it could appear, spuriously, 
that the predictor was 68% correct 
within one sample but only gave 50% 
correct prediction in the other. Thus, 
prognostic research designs which 
compare the results in the experi- 
mental group against statistical 
chance, or which compare two small 
groups that are not sufficiently 
matched on variables related to base 
rates, cannot result in useful informa- 
tion. Since effective handling of the 
problem of sample homogeneity is un- 
common in the prognosis studies rê- 
viewed by Windle and ourselves, the 
generality of findings is low, or at 
best difficult to determine. 

It has been assumed that homo- 
geneous sampling represents an effec- 
tive way of solving the problem © 
sample definition. However there 1$ 
one danger. If the basis on which the 
homogeneity is established is highly 
related to the criterion variable, the 
variability of the criterion will be re- 
stricted. This can of course obscure ê 
relationship that might exist between 
a predictor and criterion. It has been 


tacitly assumed that adequate ran- 
Jomization is not easily achieved in 
srognosis research, considering the 
ysual sample size and the biases of 
“the clinical populations from which 
they are drawn; otherwise random 
sampling would be an efficient way to 
‘select, and thus operationally define, 
the sample. 
T Another difficulty with diagnosis 
“as a basis for the definition of the 
samples is that it represents clinical 
judgments which are based upon an 
~ often uncertain weighting of situa- 
_ tional and response variables. For 
instance, the diagnosis of depressive 
reaction typically requires a differ- 
‘entiation as to whether the affective 
‘ response reflects anxiety or depres- 
sion, and a decision concerning the 
_ degree to which the affective response 
is related to a currently stressful 
"Situation. Clearly clinicians can vary 
C a8 to the relative emphasis they place 
‘on these variables; and, as several 
‘studies (Glass, Ryan, Lubin, Reddy, 
nas Tucker, 1956; Gleser, Haddock, 
Starr, & Ulett, 1954) have shown, 
~ they do vary in their weighting pro- 
_ cedures. Therefore, despite the con- 
_Venience of using diagnosis as a sam- 
| Pple-defining operation, it is weak in 
a that the researcher loses some control 
Over the basic stimulus and response 
elements upon which the judgment is 
based. Studies of the effects of these 
elements on test validity and on the 
efficiency of cutting scores are called 
for; this kind of research, frequent in 
Personnel psychology, seldom ap- 
pears in the clinical journals. 


Tests 


a Here the primary difficulty has 
deen to define that universe of test 
behavior related to outcome. The 
majority of studies cited in Windle’s 
farlier review used standard tests, 
tg., Rorschach, TAT, MMPI. It is 
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likely that these tests tap only a 
small part of the response spectrum. 
With the welter available, it is still 
far from clear how many separate 
functions they sample, and no defini- 
tive taxonomy of tests exists. Zubin 
and his co-workers (Burdock, Sutton, 
& Zubin, 1958; Burdock & Zubin, 
1956; Zubin, 1958, 1959) have pro- 
posed five broad categories of activity 
to which test behavior can be as- 
signed: physiological, sensory, per- 
ceptual, psychomotor, and concep- 
tual. Each of these five categories 
has been further subdivided into 
classes of stimuli and responses. For 
the most part, the prognostic meas- 
ures which Zubin has selected to use 
within each category are simpler than 
such tests as the Rorschach, in the 
sense that they present fewer stimu- 
lus dimensions and require less elabo- 
rate and lengthy responses. Such 
systems of categorization may indi- 
cate a range of tests available, but 
within each category there is a de- 
gree of complexity which at this time 
is largely unknown. However, factor 
analyses have been carried out in the 
areas of perception (Thurstone, 
1944), psychomotor tests (Fleishman 
& Hempel, 1954, 1956; Hempel & 
Fleishman, 1955; Seashore, Buxton, & 
McCollum, 1940), and cognition 
(Guilford, 1956, 1959). Such anal- 
yses afford at least a partial basis for 
rational test selection. 

Since a number of studies (e.g. 
Conrad, 1954) indicate that severity 
of mental illness is a significant prog- 
nostic variable, researchers looking 
for simple tests for prognostic studies 
may find it of value to consider 
studies in the area of differential diag- 
nosis. H. E. King (1954) was able to 
differentiate between chronic schizo- 
phrenics, subacute behavior dis- 
orders, and normals using psycho- 
motor tasks; and Eysenck, Granger, 
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and Brengelmann (1957), with 
groups similar to those used by King, 
found a large number of both simple 
and complex perceptual tests which 
discriminated between their groups. 
Rabin and G. F. King (1958) re- 
viewed studies dealing exclusively 
with schizophrenia, and concluded 
that “relatively high discriminatory 
power... has been obtained with 
simple experimental tasks. In many 
cases it has been as good as or better 
than that found with more complex 
tasks” (p. 253). 


Criteria 


There are three broad aspects of 
prognosis in mental illness: duration, 
course, and outcome. Studies pre- 
dicting duration have used criteria 
such as length of hospital stay, the 
amount of time spent on the admit- 
ting or disturbed ward before transfer 
to a less disturbed ward (Gordon, 
Lindley, & May, 1957), and length 
of treatment. 

Criteria involving the course of ill- 
ness include measures of termination 
and relapse. In inpatient settings 
premature termination has been de- 
fined as leaving the hospital against 
medical advice; in outpatient settings 
it has been variously defined as not 
appearing for the initial interview 
after making an appointment, drop- 
ping out of therapy before some stipu- 
lated minimal number of contacts, or 
dropping out of therapy against the 
wishes of the therapist. 

All criteria of improvement have 
been classified in this paper as meas- 
ures of outcome. It could be argued 
that change over time is a measure of 
the course of illness, but this category 
has been reserved for specific qualita- 
tive aspects of change. Current cri- 
teria of improvement present the 

same difficulties in definition, and for 
this reason it is convenient to deal 
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with them together, and to separate 
them from termination and relapse 
criteria. Because there is no univer- 
sally agreed upon definition of the 
term ‘“‘mental illness” (Jahoda, 1958; 
Scott, 1958), there has been a con- 
comitant lack of clarity about how to 
measure its alleviation. Three 
sources of improvement criteria are 
common: (a) ratings of improvement 
made by the therapist, the patient, or 
other persons in contact with the pa- 
tient such as relatives, professional 
staff other than therapist, or even 
fellow patients; (b) changes in objec- 
tive measures of functioning such as 
physiological changes, or improve- 
ment in psychological test perform- 
ance; and (c) follow-up data of a be- 
havioral nature, such as whether pa- 
tient is able to get and hold a job, to 
get or remain married, or, in what- 
ever way, to resume a minimally in- 
dependent social existence. 

There have been several attempts 
to systematize these various outcome 
measures. An early breakdown of the 
separate areas of behavior which 
should be evaluated was made by 
Knight (1941). He suggested that 
therapists look for change in these 
five areas of adjustment: the disabl- 
ing symptoms or problems, the inter- 
personal relations, the sexual adj ust- 
ment, the productivity (i.e., the abil- 
ity to work effectively and to utilize 
available energy), and the ability to 
handle stress. In Zubin’s classifica- 
tion of tests, the ability to handle 
stress is viewed as a general param- 
eter which might apply to the other 
four areas. ; 

Barron (1953b) listed five similar 
criteria of improvement: (a) the pa- 
tient feels better—indicated by intro- 
spective comments by the patients 
(b) the patient relates better to others 
—requiring a follow-up at wom 
school, or home, and often based 0” 
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reports of members of the patient’s 
social group; (c) the patient’s symp- 
toms clear up—as measured by 
psychiatric ratings of improvement 
at discharge, as well as indirectly by 
measures of duration, e.g., length of 
hospital stay and speed of transfer to 
minimum security wards; (d) the pa- 
tient makes decisions in a health- 
tending direction; and (e) the pa- 
tient’s verbal behavior shows in- 
creased “insight.” 

A few other criteria have occasion- 
ally been proposed to supplement 
these. Winder (1957) has suggested 
changes in the adjustment of children 
of the patient, and Morse (1953) has 
proposed accessibility to psycho- 
therapy. Reznikoff and Toomey 
(1959) list in detail a variety of at- 
tempts to provide a taxonomy of out- 
come criteria. 
_ There are measurement problems 
in all of these approaches. Scott 
(1958) has pointed out several con- 
ceptual and methodological difficul- 
ties in the various definitions of 
mental health. His discussion can be 
applied to Barron’s criteria of im- 
provement in mental health: (a) 
apparent change in subjective feel- 
ings or symptomatology can be a 
function of change in environmental 
Conditions or can be distorted by de- 
fense mechanisms; (b) difficulties in 
Social relationships can be a function 
of the differing requirements of 
Socioeconomic and cultural systems, 
and can change as the patient 
changes his community or his con- 
tacts in the community; (c) there can 
be disagreement over which is a 
ealth-tending direction, since value 
Systems are frequently involved; and 
ee in insight may be a func- 
ee e degree to which the pa- 
the is willing to conform to the 
a ory and values of the therapist. It 

ould be noted that these points 
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need not be regarded as criticisms of 
the definitions. If, for instance, 
changes in subjective feelings are con- 
sidered important in their own right, 
then changes in feelings, whether due 
to environment or defense mecha- 
nisms, are still of interest. However, 
when used as criteria, such changes 
are meant to reflect specific intra-in- 
dividual changes that are inde- 
pendent of environmental or irrele- 
vant personal factors. Despite this, 
most of the research in prognosis 
seems designed to demonstrate only 
that characteristics of the patient 
exist which relate to outcome, with- 
out controlling sufficiently for the 
above mentioned environmental and 
personal factors. 

On a less general level, Parloff, 
Kelman, and Frank (1954) have 
listed several common sources of 
ambiguity in improvement criteria: 
(a) improvement is often treated as a 
unitary concept, but this may be 
erroneous; (b) the emphasis of the 
rater can interact with aspects of the 
treatment—for instance, symptoms 
typically disappear before insight 
occurs, so that a rater who requires 
signs of insight before he gives a 
rating of improvement will judge 
fewer patients to be improved than 
one who accepts symptom allevia- 
tion as improvement; and (c) im- 
provement is likely to be overesti- 
mated, since patients fluctuate in be- 
havior, and at any given time signs of 
improvement in one or more specific 
areas are likely to be present and 
thus overvalued by a judge being 
asked to make a global, subjective 
rating. Pascal and Zax (1956) crit- 
icize the usual gross “improved-un- 
improved” criterion on the grounds 
that it is not sufficiently tailored to 
the specific desired changes of the pa- 
tient. They reject all nonbehavioral 
criteria of improvement, and essen- 
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tially appear to feel that symptom- 
change should be the primary cri- 
terion of improvement. 

It would be valuable to know the 
factorial structure of the above 
course, duration, and outcome meas- 
ures. While no study was found 
which attempted to do this, several 
reported intercorrelations between 
two or more prognostic criteria. 
These will be described separately for 
the kinds of criteria involved. 

Correlations between outcome meas- 
ures. Kelman and Parloff (1957) 
intercorrelated a number of measures, 
including ratings of comfort and self- 
awareness made by the patient, and 
social effectiveness ratings made by 
persons close to the patient as well as 
professional observers. The change 
in rating from pretherapy to 20 weeks 
after the initiation of therapy was de- 
termined. Only 1 of 21 intercorrela- 
tions between these measures of 
change was found to be significant. 
However, the correlations were based 
on an N of only 15, and the period of 
time was perhaps too short to expect 
more than minimal changes. 

Storrow (1959, 1960) compared 
ratings of improvement made by 
therapists, patients, relatives of the 
patient, and a psychiatrist who had 
access only to abstracted material. 
Two related rating clusters were 
found: the patient's self-rating, the 
relative’s rating, and the rating made 
by inexperienced therapists (third 
year medical students) formed one 
cluster; with the experienced thera- 
pist and the nontherapist psychia- 
trist forming the other. The correla- 
tions within clusters ranged from .61 
to .79; between clusters, .32 to .57. 
These two clusters seemed to reflect 
primarily a dichotomy between pa- 
tient and experienced therapist, since 
the relatives, and apparently the in- 
experienced therapists, gained their 
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impression from hearing the patient's 
views of his progress, while the none 
therapist psychiatrist obtained his 
knowledge from the file written by 
the therapist. Storrow had the rat- 
ings made separately for each of 
Knight’s (1941) five areas, and the 
average intercorrelation between 
areas was approximately .60. Ells- 
worth and Clayton (1959) found that 
a measure of ward adjustment at dis- 


charge correlated significantly (.47) 


with a 3-month follow-up rating of 
community adjustment. However, 
amount of psychopathology at dis- 
charge had no relationship to the 
follow-up criterion. Their finding can 
be compared with the intercorrela- 
tion of .57 reported between two 
simultaneous ratings of adjustment 
made on different scales (Stilson, 
Mason, Gynther, & Gertz, 1958). 
Patient expressions of positive and 
negative feelings have been used as 
evidence of improvement (see Auld & 
Murray, 1955, for a review of these 
measures). Barry (1950) found low 
but significant correlations between 
these so-called internal or feeling cri- 
teria and global judgments of im- 
provement in adjustment. Rogers 
and Dymond (1954) have found that 
changes in patient self-ratings om 
Q sorts correlated with ratings and 
other criteria of improvement. In an 
analogous group research program 
Snyder (1953) reported that self- 
rating changes correlated signifi- 
cantly with judgments of improve 
ment. The same results have been 
reported by Kalis and Bennet (1957). 
Taylor (1955) found that self-ratings 
(Q sorts) tend to become increasingly 
positive simply with the passing ° 
time. This suggests that it is impera- 
tive to control for time in treatment 
in order to evaluate the actual extent 
of the relationship between self-rat 
ings and other improvement criteria- 
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Correlations between duration and 
outcome. Ullman (1957) reported that 
a measure of length of hospital stay 
correlated .36 (N =72) with a meas- 
ure of adequacy of interpersonal 
relationships (Palo Alto Group 
Therapy Scale), those rated most 
adequate after a period of group 
therapy being the ones with the short- 
est hospital stay. Pascal et al. (1953) 
found a correlation of .37 (N=486) 
between length of hospital stay and 
ratings of improvement made a year 
after discharge; again, the greater the 
improvement, the shorter the hospi- 
tal stay. 

A significant positive relationship 
has been frequently reported (Bailey, 
Warshaw, & Eichler, 1959; Myers & 
Auld, 1955; Seeman, 1954; Sullivan, 
Miller, & Smelser, 1958) in which 
greater length of psychotherapy in 
Outpatient settings is accompanied 
by judgments of greater improve- 
ment. An interesting exception to 
this is the phenomenon called the 
“failure zone.” 

D. S. Cartwright (1955) found a 
grossly linear relationship between 
the number of psychotherapy ses- 
sions and success of outcome as noted 
by the therapists; but the mean suc- 
cess rating dropped sharply for those 
Whose therapy lasted from 13 to 21 
interviews. Cartwright was report- 
ing on cases treated by nondirective 
techniques. Taylor (1956) validated 
this “failure zone” in a psychoana- 
lytically oriented setting. Standal 
and van der Veen (1957) obtained 
the same drop in a counseling center 
sample. Vosburg (1958), in an ex- 
amination of treatment charts, found 
evidence that from the fifteenth to 
twentieth hour was a period where 
Outpatients tended to be preoccupied 
ki their relationship with the 
_‘aerapist, suggesting that treatment 
Which ended in this period might 
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often be due to a desire on the part of 
either the patient or therapist to 
avoid the close, dependent relation- 
ship which was developing. Perhaps 
supplementing this, Ends and Page 
(1959) reported that the "fight into 
health” reaction occurred in group 
psychotherapy uniformly around the 
fourteenth session. 

Correlation between duration and 
course. Crandall, Zubin, Mettler, and 
Logan (1954) found a significant rela- 
tionship between the duration of ini- 
tial hospitalization and rehospitaliza-~ 
tion; patients who stayed in the hos- 
pital a short time were most likely to 
still be out of the hospital on 1 to 4 
year follow-up. 

To summarize these intercorrela- 
tions, patient self-ratings and thera- 
pist ratings appear to covary toa 
high degree. Although measures of 
duration and course of illness have 
some relationship to improvement 
ratings, they seem also to tap differ- 
ent sources of variance. 

Reliability. The reliability of out- 
come criteria has received attention; 
the duration and course measures are 
objective enough so that their reli- 
ability has been taken for granted. 
Miles, Barrabee, and Finesinger 
(1951) reported low interjudge but 
high test-retest intrajudge reliability 
of global judgments of improvement. 
Ten cases were rated by four judges 


plete agreement for only 20% of the 
judgments, though no disagreement 
was by more than two points. Test- 
retest figures showed 70% to 74% 
complete agreement between ratings 
taken 6 to 8 months apart. The rat- 
ings were based on structured inter- 
view material, and probably repre- 
sent the lower bounds of interjudge 
agreement, if it is assumed that rat- 
ings made after a long period of ob- 
servation of the patient would show 
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more stability than ratings made on 
the minimal information contained in 
a structured interview. These in- 
vestigators felt that changes in psy- 
chiatric status over time cannot be 
discriminated any more finely than 
in terms of three gross classes: un- 
changed or worse, improved, and 
markedly improved. Levitt (1957) 
presented data suggesting that 
judged improvement rate tends to in- 
crease as a function of the number of 
points on the scale. The greatest dis- 
crepancy was due to studies using a 
two-point ‘“‘improved-unimproved” 
scale, where the mean percentage im- 
proved was 51. Studies using three- to 
five-point scales had mean improve- 
ment rates of 73% to 76%. 

A possible source of unreliability in 
judgments of improvement lies in the 
fact that they may confound the 
amount of change with the absolute 
level of terminal adjustment. Thus it 
seems likely that the reason initial 
severity of illness correlates with im- 
provement (Conrad, 1954) is to some 
extent due to the fact that those who 
are high on a measure of adjustment 
initially will be high on adjustment 
terminally, though the change may 
be far from being as dramatic as for 
patients who are admitted in a state 
of confusion and disorientation, and 
discharged without these symptoms. 
Since each judge can combine amount 
of change and absolute level as he 
chooses, in most studies, a lowering 
of interjudge agreement is to be ex- 
pected. This may be involved in the 
much higher interrater reliabilities 
reported by Morton (1955) than by 
Miles, Barrabee, and  Finesinger 
(1951). Morton developed seven- 
point scales of absolute level of ad- 

justment in 12 different areas. After 
training, the interrater reliability co- 
efficients ranged from .79 to .91 when 
the ratings were based only on tran- 


ess 
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scriptions of a terminal interview; 
and the reliability of the improve- 
ment score (the difference between 
ratings of an initial and terminal 
interview) ranged from .59 to .78. 

Tests as criteria. A possible cri- 
terion of outcome is performance as 
measured by tests. The present re- 
view uncovered no studies which 
used changes in test scores as primary 
prognostic criteria but it remains a 
reasonable possibility. The primary 
requisite for this use of tests would 
be evidence that the tests covary 
with the changes in patients that go 
to make up the concept of improve- 
ment. A number of studies have been 
published which tackle this question, 
and in general they support the as- 
sumption of covariation. 

Pascal and Zeaman (1951) found 
that the Bender-Gestalt, color-nam- 
ing, noun-naming, and serial sub- 
traction, from a larger battery of 
tests, correlated with the course of 
progress as judged clinically, for four 
patients getting electroconvulsive 
therapy. 

Hybl and Stagner (1952) reported 
a significantly greater decrease in the 
amount of disruption of performance 
brought about by a frustration ex- 
perience, for patients rated by their 
therapists as improved. The tasks 
were three psychomotor tests: the 
Ferguson Form Boards, Digit Sym- 
bol from the Wechsler-Bellevue, and 
the Minnesota Rate of Manipula- 
tion Test. 

Vinson (1952) administered 4 
mirror drawing test before and dur- 
ing electroshock therapy to 18 1” 
patients. Change in the mirror draw- 
ing score correlated .72 with change 
in orientation as evaluated by the 
clinical staff. 

Several studies (Hozier, 195% 
Wechsler, 1958) indicate that 4 
psychotic patients improve there ÍS # 
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decrease in variability of both the 
quality and the quantity of test per- 
formance. 

The MMPI has been used in a 
number of studies of change: several 
studies (Carp, 1950; Feldman, 1951; 
Schofield, 1950, 1953) have reported 
that hospitalized patients treated 
with somatic therapies show an aver- 
age drop on all of the MMPI scales of 
from 8 to 13 T-scale points. The 
acutely ill changed more than the 
chronically ill, and the affective dis- 
orders showed a greater change than 
the schizophrenics. Feldman (1951, 
1952) found that improved patients’ 
MMPI profiles dropped more than 
unimproved patients’ profiles, and 
that the averaged profiles of these 
two groups showed greater differences 
after therapy than before. Work 
with predominantly psychoneurotic 
samples (Barron & Leary, 1955; 
Kaufman, 1950; Schofield, 1950) has 
indicated a larger drop on most 
scales for improved patients than for 
those rated unimproved. Changes 
taken without regard to sign (de- 
creases as well as increases) were sig- 
nificantly greater in an individually 
treated group than in a group treated 
by group-therapy methods (Barron 
& Leary, 1955; Leary & Harvey, 
1956). 

Harris (1959) has summarized 


such MMPI studies to date as 
follows: 


scores on the MMPI show little change in 
cot and in untreated psychiatric patients 
k er extended periods of time; somatic ther- 
ai which is known to be effective at least in 
eadying patients for discharge from the hos- 
na is accompanied by sizeable drops in test 
ne ok patients in psychotherapy show smal- 
Fise anges, perhaps not much larger than 
SE peed by the passage of time alone; 
Eik T magnitude of change in test scores 1s 
tb K 3 to clinical estimates of improvement 
a ). (Quoted by permission of National 

ademy of Sciences-National Research 


Council) 
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Extraneous effects in test-retest 
comparisons need to be kept in mind, 
and Windle (1954) has reviewed 
these in reference to questionnaires. 
He presents evidence for a general 
tendency toward less deviant answers 
on retest, irrespective of external 
factors. This tendency is less, the 
greater the time period between test 
administrations. But even taking 
these artifactual sources of error into 
account, there appears to be evidence 
that a variety of test responses 
change in a manner consistent with 
therapist judgments of change in 
mental health. 


RESEARCH IN PROGNOSIS 


This section is organized around 
the three elements that seem most 
prominent in any treatment: the 
treatment itself, the person adminis- 
tering the treatment, and the patient 
who receives the treatment. Dura- 
tion, course, or outcome of illness 
can potentially be affected by any 
one of these. The practical need to 
determine the prognosis of a patient 
implies that some selection is pos- 
sible concerning the most appro- 
priate treatment for that patient, or 
the most appropriate patient for a 
given treatment. Thus in the head- 
ings below we use the terms: treat- 
ment selection, therapist selection, 
and patient selection. 


Treatment Selection 

Ideally, the basic problem in prog- 
nosis is the assignment of patients to 
treatments in such a way as to maxi- 
mize the total ratio of improved to 
unimproved patients. In decision 
theory terms, the prognostic judg- 
ment is a case of decision-making 
under conditions of certainty, which 
implies that the relationships be- 
tween treatments and effects or out- 
comes are known. However, it has 
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not been demonstrated that different 
treatments have different effects. To 
quote an authority, 

One is reluctantly forced to admit that we 
simply do not possess the factual knowledge 
as of 1957 which permits us to say that we 
have any treatment procedure in psychiatry 
which promises a better outlook for a partic- 
ular illness than does nature left to her own 
devices (Hastings, 1958, p. 1057). (Quoted by 
permission of the American Journal of Psy- 
chiatry) 

Several attempts have been made 
to survey the literature on treatment 
effects, all of them hampered by the 
difficulties in comparing studies with 
different diagnostic groups, and 
different criteria for improvement. 
Eysenck (1952) selected 24 studies on 
the effect of psychotherapy with 
psychoneurotics, and concluded that 
these relatively homogeneous studies 
did not offer any evidence that im- 
provement rate for those receiving 
psychotherapy was greater than for 
those getting only custodial care. 
Methodological weaknesses in his 
survey were pointed out by Rosen- 
zweig (1954) and DeCharms, Levy, 
and Wertheimer (1954). 

Levitt (1957) surveyed 30 articles 
evaluating psychotherapy with chil- 
dren. He compared the improvement 
rate on discharge and follow-up for 
treated cases with that reported for 
children accepted for therapy who 
never appeared for a first interview. 
The results were similar to those 
found by Eysenck, and did not dem- 
onstrate any facilitation of recovery 
due to psychotherapy. 

Appel, Myers, and Scheflen (1953) 
summarized the results of studies 
which met a list of what they felt 
were minimal standards. They broke 
down the findings separately for 
schizophrenic, affective, and psycho- 
neurotic disorders. Their survey in- 
dicated that none of the treatments 

studied—insulin coma, electrocon- 
vulsive shock, electronarcosis, lobot- 
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omy, or psychotherapy—gave re- 
covery rates significantly greater 
than that reported for groups re- 
ceiving only routine hospital care, 
in any of the three disorder cate- 
gories. A more recent review by 
Staudt and Zubin (1957) covering 
the somatotherapies indicated that 
insulin and electroconvulsive shock 
temporarily increase the improve- 
ment rate, but after 3 years the if- 
crease has dissipated. This conclu- 
sion would seem to fly in the face of 
the fact that most of the studies re- 
viewed by Staudt and Zubin reported 
significantly greater recovery for the 
treated group than for the control 
group at all periods of follow-up. 
However, the groups were equally 
different before treatment was begun; 
in most instances the control groups 
“seem to be highly selected and 
loaded with patients of apparently 
poor prognosis. Their improvement 
rates fall far short of the ‘spontane- 
ous improvement rates’ ” (Zubin, 
1959, p. 344). This bias in selection 
of control groups is also likely to be 
operating in studies of psychotherapy 
unless matching procedures are Pos 
sible, since there seems to be a feeling 
in many clinics that ethical considera- 
tions make it mandatory that pa 
tients who appear treatable be given 
treatment as quickly as possible. 
Kramer and Greenhouse (1959) 
discuss a point which bears directly 
on the adequacy of studies in this 
area. They show the statistical im- 
plications of the common sense no- 
tion that the less dramatic the effect 
one is looking for, the larger „the 
sample necessary to show that it 1s 
significant. Their tables indicate 
that if one is interested in identifying 
in the experimental group as slig 
an improvement as 5% over the con- 
trol group (at the .05 level of sign” 
cance) for base rate improveme? o 
40% (which is close to that found in 
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schizophrenia) it would take at least 
569 cases in each group. For a base 
rate improvement of 70% (typical of 
the psychoneurotic) 472 cases per 
group would be needed to demon- 
strate a 5% increase under ideal con- 
ditions. These estimates further 
assume perfect reliability of the im- 
provement criterion. Kramer and 
Greenhouse point out that very few 
states have a large enough population 
of mentally ill to do a study with a 
sample sufficient to detect slight but 
significant effects. Thus all the 
studies on the effect of treatment 
using small samples implicitly assume 
no interest in detecting anything less 
than extremely large differences. 
This is why it has been emphasized 
that treatment effects seem to be 
negligible relative to other variables 
in determining outcome; in view of 
the size of samples for research in 
this area it would not be fair to say 
that slight treatment effects may not 
exist. 

How do patients regard psycho- 
therapy? Stotsky (1956a) found that 
only 10% of a VA sample mentioned 
psychotherapy when asked to list 
any treatments which helped them. 
If asked directly whether they felt 
psychotherapy was the most impor- 
tant part of their treatment, over 
50% said yes. These patients came 
predominantly from a lower socio- 
economic class which, as will be dis- 
cussed later, would bias the results 
in the direction of more negative 
answers, 

Two final points can be made. It 
first should be said that clear-cut 
effects of psychotherapy seem to 
have been demonstrated using the 
patient's verbal behavior, rather 
than judgments of improvement, as 
the criterion measure (Rogers’ & 
Dymond, 1954; Rosenthal, 1955). 

Secondly, it might be pointed out 
that the inconclusive state of affairs 
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regarding the effects of treatment is 
not necessarily discouraging from the 
restricted point of view of the re- 
searcher. If treatment effects are 
currently less important than effects 
due to other sources of variance, 
then the researcher can ignore treat- 
ment differences in his samples and 
in the formulation of his hypotheses, 
thus considerably simplifying the re- 
search design. 


Therapist Selection 


A special aspect of treatment selec- 
tion is the question of what kind of 
therapist does best with what kind of 
patient in psychotherapy. In the 
years surveyed in this review the 
pertinent articles in this area dealt 
with such therapist variables as sex, 
vocational interests, professional affi- 
liation, and experience. 

Irrespective of cause, are there 
differences between therapists as to 
treatment results? Imber, Frank, 
Nash, Stone, and Gliedman (1957) 
compared three therapists, each of 
whom worked with 18 patients. No 
significant differences were found be- 
tween therapists, against a criterion 
of ratings of improvement in social 
effectiveness. Sullivan, Miller, and 
Smelser (1958) found neither sex, 
experience, nor profession (psychia- 
trist, psychologist, or social worker) 
to be related to either length of stay 
in therapy or to ratings of improve- 
ment. Hiler (1958a) reported signifi- 
cant differences in number of re- 
sponses on the Rorschach between 
six groups of patients (14 per group), 
each group subsequently treated by 
a different therapist. He interpreted 
this as indicating that the therapists 
differed in their ability to keep un- 
productive patients in therapy. 
Stieper and Wiener (1959) found 
significant differences between thera- 
pists as to the length of time they 
kept patients in therapy. The differ- 
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ences seemed to be related to per- 
sonality variables in the therapist, 
such as having high goals concerning 
very sick patients, and needing to 
feel appreciated. They took a nega- 
tive view toward this minority of 
therapists who keep patients in 
therapy for long periods: 

It seems to us likely that psychotherapeutic 
practice today contains self-defeating concepts 
which may not only be hampering to the suc- 
cess of treatment, but potentially harmful to 
its clients (p. 241). 


Betz and Whitehorn (1956) found 
differences in treatment between 
therapists who had a cumulatively 
high improvement rate with schizo- 
phrenics and therapists with a low 
improvement rate. The successful 
therapists were more active, em- 
phasized utilization of assets, un- 
derstood the meaning of the pa- 
tient’s behavior, and engendered 
more trust and confidence. They 
also differed from unsuccessful thera- 
pists in their scores on the Strong 
Vocational Interest Test. 

Myers and Auld (1955) found that 
the experienced staff in an out-pa- 
tient clinic had fewer patients quit 
against the therapist’s wishes, and 
more patients who improved, than 
the residents in the same clinic. Katz 
and Solomon (1958) concluded that 
in their sample the less experienced 
therapists tended to lose more pa- 
tients, but if the patient continued 
treatment, the improvement rate was 
as high as for the more experienced 
therapists. Strupp (1958) had 134 
residents and psychiatrists respond 
to a sound film of an initial interview. 
He interpreted his data as showing 
two types of therapists. Type I was 
positive in his feelings toward the pa- 
tient, optimistic about prognosis, and 
permissive and passive in therapy— 
and relatively inexperienced. Type 
II was more experienced, was nega- 
tive toward the patient, pessimistic 


about prognosis, and active in ther- 
apy (giving orders and advice, and 
venting his irritations). Strupp 
quotes Kubie (1956) on reasons for 
this increasing pessimism: Kubie 
mentions his disappointment, saying 
it is one shared by other psychoana- 
lysts, to find that with increasing ex- 
perience he did not seem to have in- 
creasing success. 

Several studies (Katz, Lorr, & 
Rubinstein, 1958; Sullivan et al., 
1958) have reported that the more 
experienced the therapists, the larger 
the percentage of cases rate! by him 
as improved; and the less «vere the 
illness, the greater the likel’ 100d of a 
patient's having an experienced 
therapist. Clearly, it is advisable to 
control for severity of illness in re- 
search on therapy. Differences in 
socioeconomic level also appear to 
interact with experience. Schaffer 
and Myers (1954) studied all cases 
accepted for treatment in an out- 
patient clinic during 1 year and 
found that 
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the higher a patient's social class position . - - 
in the community, the greater were his chances 
of being accepted for psychotherapy, of being 
assigned to a relatively experienced therapist 
occupying a high status within the clinic, and 
of maintaining contact with the clinic (p. 88). 
(Quoted by permission of Psychiatry) 


It is apparently also likely (Winder 
& Hersko, 1955) that the higher the 
social position, the higher the likeli- 
hood that the therapist will decide on 
analytic rather than supportive Pro 
cedures. 

Since the above studies did not 
control for these contaminating 
factors, it must be concluded that 
demonstration of between-therapist 
effects on outcome has not been con- 
clusively obtained. This is not 
particularly surprising, in view ° 
the fact that therapist selection * 
just a special case of treatment selet: 
tion. Again, though, it can be sa! 


that effects can probably be shown, 
‘against other than improvement cri- 
teria. For instance, Rosenthal (1955) 
found that the amount of benefit a 
client said he obtained from therapy 
correlated .68 with the degree of 
shift in moral values toward those 
held by the therapist, if the values 
had been talked about during psycho- 
therapy. This change would appear 
to be related to those obtained in 
laboratory studies on verbal condi- 
tioning (Krasner, 1958). 


Patient Selection: Outcome Criteria 


We turn now to the question of the 
relationship of intra-individual vari- 
ables to prognostic criteria. The 
studies will be grouped along two di- 
mensions. They will be considered 
according to the kind of criterion 
_ used—outcome, duration, or course— 
; and further broken down, where pos- 
sible, in terms of the type of test 
£ üsed—projective, questionnaire, Or 
performance (including cognitive 
= tests). 
Nontest indicators. Before turning 
to the research using tests as predic- 
tors of outcome, it is of interest to 
survey briefly what has been found 
using nontest variables. Huston and 
Pepernik (1958) reviewed prognostic 
Variables in schizophrenia, and pre- 
sented evidence that only these vari- 
ables had been firmly established as 
going with favorable outcome: acute 
onset, short duration of illness prior 
to hospitalization, a precipitating 
stress, and the absence of flat or in- 
appropriate affect. A series of studies 
| under the direction of Pascal in- 
vestigated the interrelationships of 
these variables within a sample of 
varied psychotics. It was found that 
acute onset (Swensen & Pascal, 
954b) and aggression directed 
oward oneself (Feldman, Pascal, & 
wensen, 1954) related significantly 
favorable outcome when other 
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prognostic variables were controlled. 
However, precipitating stress (Cole, 
Swensen, & Pascal, 1954), affective 
expression (Bayard & Pascal, 1954), 
and duration of illness (Swensen & 
Pascal, 1954a) did not relate to out- 
come in their sample when the effect 
of other prognostic variables was held 
constant. The generality of their 
findings is not clear, since their 
method of balancing groups for con- 
trol purposes led to their using only a 
small portion of the total sample, 
thus allowing for the possible intro- 
duction of unknown biases. 

Eskey, Friedman, and Friedman 
(1957) could not find support for the 
notion that disorientation relates to 
duration of illness; however, they 
restricted their sample on the cri- 
terion variable by not using patients 
who were unimproved at discharge. 
Several studies (Eskey & Friedman, 
1958; Phillips, 1953) indicate that 
intact cognitive processes and a 
mature premorbid social and sexual 
life go with favorable outcome. Zubin 
(1959) presents the results to date of 
an uncompleted survey of prognostic 
indicators for schizophrenia, which 
suggests that the variables defining 
reactive schizophrenia go with favor- 
able prognosis, and those defining 
process schizophrenia go with un- 
favorable prognosis. He presents a 
valuable count of articles supporting 
or negating the postulated relation- 
ship for almost every if not every 
prognostic indicator that has been 
investigated. There have been sev- 
eral attempts to combine these vari- 
ables into a scale. Thorne (1952) 
intuitively combined five into a 
quantified prognostic scale. More 
recently Lindemann, Fairweather, 
Stone, and Smith (1959) have devel- 
oped a somewhat similar scale and 
cross-validated it against a criterion 
of duration of hospital stay. An 
eight-point scale (Schofield, Hatha- 
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way, Hastings, & Bell, 1954) devel- 
oped to predict a follow-up criterion 
of adjustment in schizophrenia could 
not be cross-validated by Stone 
(1959). Becker and McFarland 
(1955) developed and cross-validated 
a 16-item scale against a criterion of 
improvement in a lobotomized sam- 

le. 

Š The above studies have dealt with 
psychotics, or samples predominantly 
psychotic. Miles, Barrabee, and 
Finesinger (1951) reported that in a 
hospitalized psychoneurotic sample, 
age of onset, duration of illness prior 
to hospitalization, and a number of 
symptoms were unrelated to out- 
come. Patients with symptoms asso- 
ciated with autonomic discharge were 
most likely to remit. Rosenbaum, 
Friedlander, and Kaplan (1956), 
studying an outpatient sample, found 
improvement occurred in patients 
with good premorbid history whose 
environment offered many supports; 
and improvement was mainly in 
marital and work adjustment. Com- 
parison of results on inpatient and 
outpatient samples suggests some 
reason for dealing separately with 
psychotics and psychoneurotics in 
prognosis research. 

An important question is how well 
the clinician, using these nontest 
indices, can do in predicting outcome. 
Clow (1953) obtained a majority 
opinion of prognosis at the staff con- 
ference which was held 2 months 
after admission on each of 100 female 
schizophrenics. The prognoses were 
73% correct in predicting a dichoto- 
mous improved-unimproved criterion 
obtained at discharge. More studies 
of this kind would be helpful in evalu- 
ating the practical usefulness of 
adding tests to current prognostic 
procedures. 
Projective tests. Several Ror- 

schach studies have used a configura- 
tional score, the Prognostic Rating 
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Scale (PRS) (Klopfer, Kirkner, Wi 
sham, & Baker, 1951). Kirkner 
Wisham, and Giedt (1953) found 
correlation of .67 between PRS an 
improvement ratings obtained 
evaluating the terminal closure not 
on a sample of 40 receiving psyche 
therapy. Mindess (1953) obtained a 
correlation of .66 (N of 70) between 
PRS and a diagnostic criterion rum 
ning from normal through neurotic to 
psychotic, obtained 6 months afte 
initiation of psychotherapy. Filmer 
Bennett (1952, 1955) did not obtail 
significant results with either thé 
PRS or global judgments based on 
the total Rorschach protocol. 
criterion was a dichotomous 
proved-unimproved rating of 
degree to which the patient 
making a satisfactory social and 
vocational adjustment a year after 
discharge from the hospital. Rosali 
D. Cartwright (1958) presented a” 
review of several successful studi 
using the PRS, and described furthe 
positive results from her own stud} 
The criterion was ratings of success 
psychotherapy made by the counselor 
after termination of therapy. In am 
appended discussion of her paper 
Snyder argued that other tests might 
do as good a job with much less time 
needed for testing. Bloom (1956) 
added an interesting modification to 
his design. He divided his 46 subject 
into two groups, an unproduc 
group (less than 11 Rorschach 
sponses) and a productive group ( 
or more responses). The PRS differ 
entiated a dichotomous criterion 
outcome of psychotherapy signit 
cantly in the productive group, bui 
not in the unproductive. He furthe 
assessed 11 other scores, and fount 
none which were either significant 
nonsignificant for the total sample 
whole; all discriminated significa 
in one or the other of his groups— 
for the productive group, and se 


PROGNOSTIC USE OF PSYCHOLOGICAL TESTS 


for the unproductive. His results 
suggest the operation of an interac- 
tion similar to the one Zubin and 
co-workers (see below) have reported 
between chronicity and outcome, and 
deserve further investigation. 
Rogers and Hammond (1953) and 
Roberts (1954), both working with 
VA outpatients, tried a sign approach 
on the Rorschach with negative 
results. Dana (1954) hypothesized 
that Card IV, assumed to be most 
likely to pick up attitudes to author- 
ity, would give responses related to 
improvement in psychotherapy, if the 
authority relationship was crucial to 
outcome. The responses were placed 
in three categories—‘‘adequate,”’ “in- 
adequate,” “‘negative’—and there 
was a significant tendency for those 
with “adequate” response to im- 
prove, and those with “inadequate” 
responses to remain unimproved. 
Hammer (1953) felt that his review of 
the literature suggested that those 
patients whose Rorschach protocols 
look sicker than their H-T-P proto- 
cols have a good prognosis, while a 
poor prognosis is associated with 
giving more negative feelings on the 
H-T-P than on the Rorschach. 
Ullman (1957) found two highly 
related measures—clinical judgments 
of TAT protocols and a social percep- 
tions test—to be correlated signifi- 
cantly with two criteria of improve- 
ment: the Palo Alto Group Therapy 
Scale and hospital status after 6 
months (hospitalized vs. discharged). 
S. Rosenberg (1954) developed and 
cross-validated eight prognostic signs 
based on the Wechsler-Bellevue, Sen- 
tence Completion, and on the Ror- 
schach. Grauer (1953) found more 
Rorschach indices of anxiety in an 
improved group of schizophrenics 
han in an unimproved. Organic 
signs did not discriminate. The 
welter of signs which these studies 
find related to improvement shows no 
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clear pattern. Obviously most of 
these positive findings with projective 
techniques should be further vali- 
dated before they can be accepted as 
more than promising leads. 

Questionnaires. Barron (1953b) 
reported lower pretherapy MMPI 
and Ethnocentrism scores for an 
improved outpatient group than for 
an unimproved group. The criterion 
was judgments of change in psycho- 
therapy made by professionals who 
had not been involved in the treat- 
ment. At least some of these relation- 
ships were due to differences in 1Q 
between the groups. Rosen (1954) 
was not able to verify Barron's find- 
ing with the E Scale. Barron devel- 
oped a special ego strength scale from 
the MMPI (Barron, 1953a), which he 
successfully cross-validated against 
improvement criteria in three dispa- 
rate samples. Wirt (1955, 1956) 
found the ego strength scale signifi- 
cantly discriminated an unimproved 
from a greatly improved group, the 
groups being extremes drawn from a 
hospitalized sample receiving psycho- 
therapy. The scale did a better job of 
discrimination than experienced clini- 
cians who based their judgments on 
the total MMPI profile. 

Feldman (1951, 1952, 1958) ex- 
plored the validity of the MMPI for 
the prediction of outcome after 
electroshock therapy. He found that 
items dealing with hostility and inter- 
personal relationships were predictive 
of outcome, while items dealing with 
symptomatology reflected the amount 
of improvement. Pumroy and Kogan 
(1958) were unable to cross-validate 
Feldman’s prognostic scale in a small 
VA sample. Dana (1954) also ob- 
tained negative results with the 
MMPI, attempting to predict im- 
provement after electroshock. 

Performance tests. Stotsky (1956b) 
gave vocational aptitude and interest 
tests to a group of schizophrenics 
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most of whom had been in the hospi- 
tal for a year or more. The aptitude 
tests predicted later work success, 
but the interest tests did not. Swen- 
sen and Pascal (1953) reported that 
the Pascal-Suttell Z score on the 
Bender-Gestalt test, was signifi- 
cantly lower for a group of inpatients 
judged to be improved on follow-up a 
year and a half later, than for those 
judged unimproved. Landis and 
Clausen (1955) found efficient per- 
formance on critical flicker fusion, 
reaction time, finger dexterity, audi- 
tory acuity threshold, and tapping 
speed was predictive of improvement 
in an inpatient sample receiving a 
variety of treatments. A variability 
score of palmar sweating (Ellsworth 
& Clark, 1957) predicted changes ina 
behavioral adjustment scale concur- 
rent with the administration of tran- 
quilizing drugs. Keehn (1955) took 
12 measures from simple cognitive 
and psychomotor tests that had been 
shown to discriminate between nor- 
mals and psychotics, and found only 
one score that predicted outcome in a 
group of inpatients receiving insulin 
coma therapy; he concluded that 
initial degree of psychoticism was not 
prognostic of outcome. 

Vinson (1952) used a mirror draw- 
ing test to predict the prognosis made 
at discharge—a dichotomous “‘favor- 
able-unfavorable’’ prognostic judg- 
ment made by the staff. His sample 
consisted of 18 hospitalized patients 
who received electroshock therapy. 
He tested before and during treat- 
ment, and the difference between 
these scores predicted the prognostic 
criterion at the .02 level of signifi- 
cance. 

The most promising findings made 
in prognosis in the last 10 years have 
been reports coming out of the 
Columbia-Greystone project of two 
interaction effects. The first interac- 


tion dealt with the relation of chronic- 
ity to outcome. Windle and Hamwi 
(1953) reported that chronic patients 
who were discharged after treatment 
had poorer admission scores on a 
complex reaction time test than 
chronic patients who were not dis- 
charged. However, for acute pa- 
tients, those whose illness was of 
short duration, the reverse was true, 
namely, poor admission scores were 
associated with poor outcome. Zubin, 
Windle, and Hamwi (1953) rechecked 
data on other tests, using chronic 
patients from the same study, and 
found four other tests which gave the 
same results. An independent valida- 
tion was provided by Sonder (1955) 
using different tests. In all of these 
studies the results were most clear- 
cut for the chronic group, probably 
due to the fact that among the acute 
patients were some who were poten- 
tially or actually chronic. 

The second interaction emerged 
from the study by Zubin, Windle, and 
Hamwi (1953) who found that the 
chronic patients who did well on 
conceptual tasks (intelligence, mem- 
ory, personality tests) but poorly on 
perceptual tasks (learning and per- 
ception tests) had a poorer prognosis 
than chronic patients who showed 
conceptual confusion but perceptual 
clarity. Williams and Machi (1957), 
also working with the chronic sample 
from the Columbia-Greystone proj- 
ect, factor analyzed the test data, 
and found some support for this 
conceptual-perceptual differentiation. 
However, this finding is not yet a8 
clearly supported by the evidence a$ 
the chronicity-outcome interaction 
Zubin and Windle (1954) reviewed 4 
number of independent prognosti¢ 
studies, and reported that a consi” 
eration of the two interaction effects 
accounted for much of the conflicting 
findings. In the light of this work, 
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further attempts to investigate these 
interactions cannot help but be of 
value. 


Patient Selection: Duration Criteria 


Projective tests. Stotsky (1952), 
working only with schizophrenics, 
compared a group of patients who in 
a 2-year period had not left the 
hospital with a group which in the 
same period of time had been dis- 
charged and remained outside for at 
least 6 months. His hypothesis was 
that the prognosis would be best for 
patients with the best pretreatment 
emotional and intellectual integra- 
tion. Of 19 Rorschach signs, 5 were 
significantly cross-validated in a sec- 
ond sample, Also, all of the 19 signs 
except R were found to be in the 
predicted direction in both samples. 

Questionnaires. Grayson and Olin- 
ger (1957), ina VA inpatient sample, 
reported that those who were given 
early trial visits were able to give 
improved MMPIs when asked to 
respond in “the way a typical, well- 
adj usted person on the outside would 
do to a greater extent than those 
still hospitalized after 3 months. 
Rapaport (1958) was not able to 
validate this finding, using a military 
sample, although the change on most 
of the scales was in the correct direc- 
tion. Stieper and Wiener (1959) 
found a group of VA outpatients who 
were seen in psychotherapy for an 
average of 5.3 years had higher pre- 
therapy scores on the MMPI scales, 
Hs and Hy, than a group who were 
discharged after 14 months. 

A demographic study (Lindemann 
et al., 1959) found an index using 
marital status, diagnosis, degree of 
incapacity, legal competence, and 
alcohol intake as variables, was re- 
lated to length of hospital stay. Ells- 
worth and Clayton (1959) found a 
rating scale of psychopathology filled 


193 


out at admission did not correlate 
significantly with length of hospital 
stay, but a behavioral adjustment 
scale did correlate, patients with the 
best admission adjustment tending to 
remain in the hospital the shortest 
length of time. 

Performance tests. Venables and 
Tizard (1956b) found “short-stay” 
schizophrenics performed better on a 
repetitive psychomotor task than did 
chronic schizophrenics. Reaction 
time differences (Venables & Tizard, 
1956a) occurred on initial testing, but 
disappeared on retest. 


Patient Selection: Course Criteria 


Under criteria measuring the 
course of illness we have placed two 
broad questions: who will relapse, 
and who will terminate treatment. 

Relapse. The broad question here 
is one of predicting who will get worse 
over time. It is of course the reverse 
of the question of who will improve. 
However, the prediction of improve- 
ment and its opposite may not neces- 
sarily be most effectively accom- 
plished with the same test. It can not 
be assumed that the prediction of 
relapse or hospitalization can be 
made from the same tests which 
predict improvement. This is consist- 
ent with the assumption that change 
of mental status need not be a unitary 
concept. 

Peterson (1954b) used the MMPI, 
Wechsler-Bellevue, Rorschach, and 
nontest data to predict who would 
require admission to the hospital 
from patients being seen on an out- 
patient basis ina VA mental hygiene 
clinic. Considering the base rates, the 
predictive power of the tests was 
slight, but the results suggested that 
the person who gets worse in therapy 
is single, has been previously. hospi- 
talized, is diagnosed psychotic, and 
has an MMPI profile strongly ele- 
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vated on the psychotic scales. Using 
a six-point scale based on signs of 
psychosis on the MMPI developed by 
Meehl, Peterson (1954a) was able to 
achieve 75% correct discrimination. 
Briggs (1958) was able to cross- 
validate this scale to a certain extent. 
He took patients who were already in 
the hospital when they received the 
MMPI. On follow-up he found the 
Peterson score differentiated those 
who were rehospitalized from those 
who were not only for patients origi- 
nally diagnosed psychoneurosis or 
mixed psychoneurosis. This is con- 
sistent with Peterson’s finding that in 
his study similar outpatient diagnoses 
were most often given to the cases 
which were later hospitalized. 
Schofield and Briggs (1958) related 
several measures of improvement 
previous to initial discharge to rehos- 
pitalization, the median follow-up 
period being 5.8 years. Improvement 
in behavior ratings made by nurses 
was not related to rehospitalization, 
but a combination of ratings based on 
pre- and posttreatment MMPIs and 
psychiatric evaluations of improve- 
ment made at the time of discharge 
allowed 75% correct prediction for 
the 66% of cases on which the two 
ratings agreed. Since knowledge of 
the base rate alone would allow 66% 
correct prediction, this was only 
slightly better than chance. 
Cowden, Deabler, and Feamster 
(1955), using a criterion of whether 
patient was rehospitalized within 90 
days after discharge, reported judg- 
ments of change from admission to 
discharge on Sentence Completion 
and the H-T-P Test predicted the 
criterion. An “ego” score obtained 
from combining the Binet Vocabu- 
lary with Cards I, III, and VIII of 
the Rorschach predicted relapse 
within a 2-year period for a sample of 
discharge patients (Orr, Anderson, 
Martin, & Philpot, 1955), but did not 
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predict discharge for a sample of non- 
deteriorated admissions. Working 
with a special group (outpatients con- 
sidered interminable) Wiener (1959) 
studied return to psychotherapy over 
a 6-month period after initial psycho- 
therapy was arbitrarily terminated. 
In his sample of 48, 37 returned for 
further therapy within this period. 
The MMPI did not discriminate 
returnees from nonreturnees. Months 
in treatment appeared to be a promis- 
ing measure, with the returnees 
having a longer history of psycho- 
therapy. 

A study that fits under neither of 
our two course criteria is one by 
Rioch and Lubin (1959). They ob- 
tained lengthy follow-up data on 93 
patients, sufficient to allow an assess- 
ment on an 11-point scale of how 
consistently the patient had moved 
upward or downward in his social 
adjustment over several years. Both 
the Wechsler-Bellevue IQ and a 
global rating based on the Rorschach 
correlated significantly with this cri- 
terion, mainly due to discrimination 
at the low end of the scale: all of the 
patients who deteriorated steadily 
had low scores on the predictors. 

Termination of treatment. The 
criterion involved in the prediction of 
length of therapy is more objectively 
determined than improvement, but 
there are some difficulties in its deter- 
mination nonetheless. 

One question is how to measure 
length of therapy. Most studies have 
used the number of interviews as the 
measure. Number of weeks in treat- 
ment would appear to be an equiva 
lent measure. However, Lorr, Kat, 
and Rubinstein (1958) found that the 
number of interviews correlated only 
.60 with number of weeks in treat- 
ment, and they argued that number 
of interviews is likely to be the les 
reliable of the two. 

Another problem springs from the 
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research design used in most of the 
studies of termination. The total 
sample is usually divided into two 
groups, terminators and remainers, 
and test scores are related to this 
dichotomous criterion. The question 
becomes one of where to cut the dis- 
tribution. ‘Terminators have been 
defined as those remaining less than 4 
sessions (Gliedman, Stone, Frank, 
Nash, & Imber, 1957), less than 10 
sessions (Auld & Eron, 1953; Kotkov 
& Meadows, 1953), or less than 20 
sessions (Gibby, Stotsky, Hiler, & 
Miller, 1954). Gibby et al. (1954) 
found that those terminating between 
5-19 sessions resembled in their test 
responses those who terminated ear- 
lier rather than those continuing on 
for more than 19 sessions. Our previ- 
ous discussion of the ‘‘failure zone” 
(Taylor, 1956) suggests that a variety 
of factors are operating in the first 20 
weeks. When these factors have not 
been controlled, they can influence 
the findings in termination studies. 

A further criticism has been made 
by Gundlach and Geller (1958) who 
Suggest that termination rate and 
duration of illness are partly adminis- 
trative artifacts, and partly a reflec- 
tion of “the kind of personality 
problems that the staff are interested 
in, or skilled at, handling.” This 
criticism can be taken as indirect 
Support for the common practice of 
defining termination in terms of the 
distribution of the length of therapy 
Measures, since in any given setting, 
the median or mean length takes 
some account of the effects of policy 
and staff interests. 

Research on the prediction of ter- 
mination by the use of projective tests 


shows a familiar, monotonous pat- ` 


tern: initial positive results with 
subsequent negative or indeterminate 
cross-validation. Kotkov and Mead- 
Md (1952, 1953) began with 12 
ormal scores, and validated one of 
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these (FC/CF). They applied a 
formula based on three scores (FC 
/CF, R, D%) to another sample, and 
D% washed out. When these same 
signs are examined in an earlier study 
(Rogers, Knauss, & Hammond, 
1951), none were significant, and only 
R was in the predicted direction. 
Auld and Eron (1953) tried a further 
validation of the Kotkov and Mead- 
ow formula, and obtained insignif- 
icant results. They found the Wechs- 
ler-Bellevue IQ accounted for the one 
Rorschach variable, R, which held up 
in their sample. 

Starting anew, Gibby et al. (1954; 
Gibby, Stotsky, Miller, & Hiler, 
1953) found 9 of 31 Rorschach signs 
promising. Taking the 9 to a second 
sample, 3 held up (R, K, m) and a 
predictive formula based on these 
variables was applied to a further 
independent sample, and afforded 
68% correct prediction. However, 
knowledge of the base rate would 
have allowed 60% correct prediction, 
so the results were not strong enough 
to be of practical use. In their sample 
the Kotkov and Meadow formula did 
no better than chance, and IQ was 
not related to the criterion. Affleck 
and Mednick (1959) used an equation 
based on R, M, and H to predict who 
would remain for longer than three 
interviews. Their equation allowed 
71% correct prediction in a valida- 
tion sample. Their terminators were 
lower in IQ than the continuers (sig- 
nificant at .06 level). This is consist- 
ent with the findings of Auld and 
Eron (1953). i 

All of the above Rorschach studies 
except for Auld and Eron used equiv- 


alent VA males being seen on an out- 
in some respects 


study to study than is true of most 
validation research in this area. Of 
all the Rorschach signs only R seems 
to have maintained its promise in 


196 


these studies. More recent work 
(Gallagher, 1953, 1954; Taulbee, 
1958) supports the conclusion that 
the number of Rorschach responses 
(R) relates to termination. However 
the Rorschach is probably an unnec- 
essarily cumbersome way of measur- 
ing this variable; for instance, Gal- 
lagher (1954) found that the number 
of words used on the Mooney Prob- 
lem Check List to describe the cli- 
ents’ problems was a better predictor 
than R. 

Libo (1957) used a TAT-type test 
to predict the number of patients who 
would return the week after the test 
was administered. For 40 subjects he 
was able to make a significant predic- 
tion based on an “‘attraction score”: 
the number of references in the stories 
to a desired move toward the thera- 
pist, or of anticipated satisfactions 
from therapy. 

Three studies dealt with the predic- 
tion of termination in a tuberculosis 
hospital. Vernier, Whiting, and 
Meltzer (1955) were able to differ- 
entiate patients who left the hospital 
against medical advice from those 
who continued treatment to the end, 
using the Rorschach and H-T-P tests. 
The TAT did not discriminate. 
Moran, Fairweather, and Morton 
(1956), using a biographical inven- 
tory and an attitude questionnaire 
found that only prehospital life 
adjustment predicted who would 
leave the hospital prematurely, with 
those leaving having a long history of 
being unable to adjust to their life 
situations. Calden, Thurston, Stew- 
art, and Vineberg (1955) developed 
and cross-validated a scale from the 
MMPI to predict premature dis- 
charge. 

Taulbee (1958) developed a key 
based on the MMPI and the Ror- 
schach to predict continuation of 
outpatient psychotherapy beyond the 
thirteenth interview. His results, not 
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cross-validated, led him to conclude 
that those who continue in therapy 
are less defensive, and more persist- 
ent, dependent, anxious, and intro- 
spective than the terminators. Sulli- 
van et al. (1958) reported no signifi- 
cant difference between MMPI scores 
of terminators, and continuers on a 
VA male sample. Of a number of 
variables only education and occupa- 
tion related to the criterion. Conrad 
(1954) had therapists fill out a check 
list covering positive mental health, 
social conformity, and behavior pa- 
thology on VA outpatients with 
differing lengths of stay in psycho- 
therapy. Continuers tended to look 
least disturbed initially, and to be at 
the median rather than at either 
extreme on social conformity. 

Rubinstein and Lorr (1956) found 
differences between extreme groups 
(patients in psychotherapy for over 6 
months vs. patients who had come 
less than six times and had termi- 
nated against the wishes of the thera- 
pist), on the authoritarian F Scale, 
and a vocabulary test. However, a 
later study (Lorr et al., 1958) which 
defined termination as having less 
than 7 weeks of psychotherapy, did 
not give significant results, though 
the scales were in the predicted direc- 
tion. They combined a number of 
scales in a further attempt, and 
obtained a significant multiple corre- 
lation in a validation sample. How- 
ever, the scales allowed no better 
prediction than interviewer’s judg- 
ment. 1 

A large recent project on termina- 
tion was carried on at Johns Hopkins 
University (Frank, Gliedman, Imbef, 
Nash, & Stone, 1957; Gliedman et al., 
1957; Imber, Frank, Gliedman, Nash, 
& Stone, 1956; Imber, Nash, & Ston® 
1955; Nash, Frank, Gliedman, Imber, 
& Stone, 1957). Their prognost 
battery included an inventory and ê 
Sway test. Those who stayed 1? 


PROGNOSTIC USE OF PSYCHOLOGICAL TESTS 


therapy more than three interviews 
were more suggestible on the Sway 
test, were more sociable, of higher 
socio-economic status, and more likely 
to see treatment as a means of main- 
taining status in their immediate so- 
cial environment, than the termina- 
tors. When they compared group 
versus individual psychotherapy they 
found an interaction between treat- 
ment and termination: in group ther- 
apy, the terminators were more SO- 
cially ineffective than the continuers, 
while the relationship was reversed 
for those getting individual therapy. 
This intriguing finding may have been 
related to an unequal distribution of 
social levels in the two groups—most 
of the lower class patients ended up in 
group psychotherapy, while most of 
the middle class patients were as- 
signed to individual psychotherapy. 

Hiler (1959) studied intial com- 
plaints, and concluded that continu- 
ers come to a clinic with typical psy- 
choneurotic symptoms—obsessions, 
phobias, anxiety, depression, poor 
concentration—while early termina- 
tors are more likely to list purely 
organic symptoms, antisocial acts, or 
schizoid feelings. His continuers also 
obtained higher scores on the Wechs- 
ler-Bellevue with a subtest pattern 
eed by Similarities being 
ae than Digit Span or Digit 

ymbol (Hiler, 1958b). 

How much overlap is there be- 
tween predictors of termination and 
improvement? Sullivan et al. (1958) 
oe the relationship of 
ibe scores and demographic vari- 

les to both improvement and ter- 
oe criteria, Only occupational 
le was related significantly to both. 

atz et al. (1958) found none of their 
ele of length of stay correlated 
en pann ratings of improve- 
aA rank et al. (1957) reported 
e ao history of social activity 
uctuating course of illness was 
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associated with continuation and 
improvement. A short duration of 
illness was associated with termina- 
tion as well as improvement. Gal- 
lagher (1954) found the Taylor Mani- 
fest Anxiety Scale predicted contin- 
uation as well as improvement. In 
general the results suggest little over- 
lap. This is somewhat unexpected, 
since as was mentioned earlier, there 
appears to be a positive relationship 
between criteria of duration of treat- 
ment and improvement. The most 
tenable assumption would seem to be 
that the variance shared by the two 
criteria is different from the variance 
shared by predictor and criterion. 
Possibly the correlation between 
criterja is due to rater bias. 


Discussion 


The previous sections of this paper 
have included the word “selection” in 
order to underline the fact that the 
practical need to predict to any of 
these criteria exists only when some 
sort of selection is necessary. For 
example, if the waiting list of an 
outpatient clinic is too long, selection 
of cases to receive treatment can be 
made on the basis of predicted prob- 
ability of improving or terminating. 
If there is no need to deny treatment 
to anyone, knowledge of these prog- 
nostic probabilities is of no practical 
use. In most mental treatment cen- 
ters today administrative procedures 
probably do not involve rejection of 
the patient as an alternative action, 
except in some outpatient clinics. 
Prognosis would be indispensable in 
the question of treatment selection, if 
differential effects of treatment were 
known; our survey has suggested that 
such effects have not yet been dem- 
onstrated. Thus it could be argued 
that prognosis is a sleeping giant at 
the present time, awaiting a future 
chance to be of service. Several other 
uses can be made of prognostic infor- 
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mation, of course. Knowledge of the 
variables which relate to changes in 
duration, course, or outcome of 
mental illness is of theoretical impor- 
tance, an aid to understanding. A 
second promising use has been pro- 
posed by Feldman (1952) and Zubin 
(1959). They recommend that in 
nonprognostic research prognostic 
status be tried as a method of classi- 
fying patients into homogeneous 
categories, in place of diagnosis. 

Is such a suggestion tantamount to 
substituting a measure of severity of 
illness for one of type of illness? The 
literature survey indicates a wide 
variety of tests have shown positive 
results, with no discernible common 
characteristic except that they meas- 
ure adequacy of functioning, directly 
or indirectly. The fact that the same 
measures do not predict for all pa- 
tients may be due to differences in the 
type and etiology of symptoms from 
patient to patient; but such differ- 
ences do not vitiate the possibility 
that when prediction occurs it is 
largely because the dimension of 
severity of illness has been accurately 
assessed by the test. In any case, the 
effect of matching groups on prog- 
nostic variables would be to control 
for base rate differences in improve- 
ment, a procedure which is impera- 
tive for many kinds of evaluational 
studies, though rarely invoked in 
research on therapy. 

As with all predictive questions, 
the primary problem in prognosis is 
the definition of the criterion. From 
the point of view of decision theory, 
the general notion of ‘‘outcome of 
illness” involves assigning utility 
values to specific outcomes; and since 
cost of achieving any given outcome 
may be a factor, an explication of the 
treatment strategies is also necessary. 

The low interjudge reliabilities which 
obtain in judgments of improvement 
indicate that utility of outcome may 
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differ from judge to judge. A pro- 
gram for achieving a more objective 
ranking of treatments, outcomes, 
or treatment-outcome combinations 
seems called for. Cronbach and 
Gleser (1957) offer a possible frame- 
work for such a program, and most of 
the points they make, although deal- 
ing with personnel selection, can be 
easily generalized to prognosis. 

A frequent misinterpretation of 
empirical research is that it is based 
on no theory. In the sense of a con- 
tent theory—i.e., a theory stating 
relationships between tangibles or 
concepts related to tangibles—em- 
pirical research is usually weak, 
though in the selection of measures 
some sort of rough theory has to 
be involved. However, empirical re- 
search often is strongly tied to a 
mathematical model. In prognosis 
the guiding model has been the linear 
regression model. The studies have 
assumed that a measurable quality 
exists which is linearly related to out- 
come. The findings in respect to per- 
formance differences between acute 
and chronic patients (Burdock et al., 
1958) suggest that this linear model 
probably will have to take account 
of interaction effects. If so, almost 
all studies to date are too simple in 
design. They involve a one-stage de- 
cision: look at one final score per pet 
son (the final score may of course be 
a combination of several subsidiary 
scores) and assign the patient to an 
outcome (criterion category) by 
whatever rule of operation is being 
applied to the score. The work of 
Zubin’s group indicates that at least 
a two-stage decision process 15 
needed: (a) a score is obtained tO 
decide which of several operations 
will be applied to a second scoté 
and (b) the second score is use to 
assign patients to the criterion cate- 
gory. Indeed there is no reason W x 
tests should not be useful as a bas 
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for deciding what operational rule 
to apply to other data. The variables 
which appear to have the strongest 
relationship to outcome have been 
nontest variables: severity and dura- 
tion of illness, acuteness of onset, 
degree of precipitating stress, etc. 
A possible direction of research 
might be to use tests to increase the 
validity of the nontest variables, 
either by trying to find tests which 
tap interactions, or which correlate 
with the error term in the psychiatric 
predictor. This latter approach has 
not been tried in prognosis, but it has 
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been used with some success in per- 
sonnel selection (Fulkerson, 1959; 
Ghiselli, 1956). A third suggested 
avenue of research would be to apply 
nonlinear or configurational models 
to prognostic data. The general point 
to be made is that prognosis research 
seems to require a different, more 
complex, mathematical model, and 
thus a more complex research design, 
than has been generally used so far. 
Specifically the one-stage design, 
where a predictor is correlated with 
an outcome measure, would appear to 
be inadequate in this field. 
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COMPLEX SOUNDS AND CRITICAL BANDS! 
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Studies of the responses of human 
observers to bands of noise and other 
complex sounds have led to the meas- 
ure of what appears to be a basic unit 
of hearing, the critical band. When 
the frequency spectrum of a stimu- 
lating sound is narrower than the 
critical band, the ear reacts one way; 
when the spectrum is wider, it reacts 
another way. For example, experi- 
ments show that at values less than 
the critical bandwidth, both loud- 
ness and absolute threshold are in- 
dependent of bandwidth; only when 
the critical bandwidth is exceeded 
do the loudness and the absolute 
threshold increase with the width 
(Gassler, 1954; Zwicker & Feldtkel- 
ler, 1955; Zwicker, Flottorp, & 
Stevens, 1957). 

The critical band has also been 
measured in experiments on auditory 
discriminations that seem to depend 
upon phase (Zwicker, 1952) and in 
experiments on the masking of a 
narrow band of noise by two tones 
(Zwicker, 1954). In all four types of 
experiment—loudness, threshold, 
Sensitivity to phase, and two-tone 
masking—the value of the critical 
band is the same function of its 
center frequency. The values of the 
critical band, as a function of the fre- 
quency at the center of the band, are 
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given by the top curve in Figure 1. 
The ordinate gives the width (AF), 
in cycles per second, of the critical 
band; the abscissa gives the center 
frequency. As the frequency at the 
center of a complex sound increases, 
the critical band that is measured 
around the center frequency becomes 
wider. 

Not only does the critical band 
have the same values when measured 
for several kinds of auditory response, 
it is also independent of such stimu- 
lus parameters as the number of 
components in the complex (Scharf, 
1959b) and the sound pressure level 
(Feldtkeller, 1955; Feldtkeller & 
Zwicker, 1956). 

Prior to the experimental measures 
of the critical band, Fletcher (1940) 
had hypothesized the existence of a 
critical band for masking. He sug- 
gested that when a white noise just 
masks a tone, only a relatively nar- 
row band of frequencies surrounding 
the tone does the masking, energy 
outside the band contributing little 
or nothing. Although attempts to 
test this hypothesis remain incon- 
clusive, investigators (Bilger & Hirsh, 
1956; Hawkins & Stevens, 1950) have 
been able to calculate values for the 
width of these hypothetical masking 
bands by assuming that the masking 
band and the just-masked tone have 
the same intensity. The calculated 
values, which are labeled ‘‘critical 
ratios” in Figure 1, are smaller for the 
masking band than for the critical 
band as measured in the experiments 
cited above. As we shall see, this dis- 
crepancy is more apparent than real. 


205 


206 


8 


3 


AF N CYCLES PER SECOND 


g 


Ww o o o oo oO 
FREQUENCY IN CYCLES PER SECOND 


Fic, 1. The width, AF, of the critical band 
and of the critical ratio as a function of the 
frequency at the center of the band. (The or- 
dinate gives the width, in cycles per second, 
of the critical band—and of the critical ratio— 
for the center frequencies shown on the ab- 
scissa., The top curve gives the values for the 
critical band which are based upon direct 
Measurements in four types of experiment; 
the bottom curve gives the values for the 
critical ratio which are calculated from meas- 
urements of the masked threshold for pure 
tones in white noise. The points on the bottom 
curve are from Hawkins and Stevens—1950. 
This figure is adapted from an article by 
Zwicker, Flottorp, and Stevens—1957, p. 556 
—which contains also a table of critical-band 
values.) (Adapted with permission of the 
Journal of the Acoustical Society of America) 


EXPERIMENTAL MEASURES OF THE 
CRITICAL BAND 


Four types of experiment in which 
critical bands have been measured 
are reviewed: absolute threshold of 
complex sounds, masking of a band 
of noise by two tones, sensitivity to 
phase differences, and loudness. 


Threshold of Complex Sounds 


When two tones, whose frequencies 
are not too far apart, are presented 
simultaneously, a subject may report 
hearing a sound even though either 
tone by itself is below threshold. 
Gissler (1954) made careful meas- 
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ures of this phenomenon, using many 
tones and systematically varying the 
difference in frequency, AF, between 
the lowest and highest components of 
the complex sounds.* He varied the 
AF by varying the number of equally 
intense tones, which were spaced at 
intervals of 20 cps. The number of 
tones was increased from 1 to 40 or 
until AF was equal to 780 cps. Each 
time a tone was added, the threshold 
for the whole complex was measured 
by a “tracking” method (Stevens, 
1958). It was necessary, of course, 
that all the tones in the complex have 
the same threshold when heard 
singly, for otherwise it would have 
been impossible to determine the 
precise cause of a change in the 
threshold for a complex whose AF 
had been increased by the addition 
of a tone. Thus measurements were 
restricted to portions of the fre- 
quency spectrum over which a sub- 
ject’s threshold curve was flat. In 
order to study other portions of the — 
spectrum, the multitone complexes 
were presented against a background 
of white noise that had been tailored 
to raise the threshold for tones at all 
the audible frequencies to the same 
level, thus artificially flattening 4 
subject's threshold curve. 

Whether the background was — 
quiet, or consisted of a noise at 0 db. 
SPL, at 20 db., or at 40 db., the 
same effect was noted: as soon as 
AF exceeded a particular value whose 
size depended upon the frequency at 
the center of the complex, the thresh- 
old for the multitone complex began 
to increase. Similar data were re- 
ported when bands of white noise 
were substituted for the multitone 


2 Two or more tones constitute a complex 
sound, i.e., a sound with energy at more than 
one frequency in contrast to a single or pure 
tone with most of its energy concentrated at & 


single frequency. 
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complexes. The results indicate that 
the total energy necessary for a sound 
to be heard remains constant so long 
as the energy is contained within a 
limiting bandwidth. Although differ- 
ences between the two observers in 
these experiments were sometimes of 
the order of 40%, the average size of 
the limiting bandwidths for both 
multitone complexes and bands of 
noise is approximated by the critical- 
band curve of Figure 1.* 


Two-Tone Masking 


The masking of a narrow-band 
noise by two tones provided a second 
measure of the critical band. Using 
a tracking method, Zwicker (1954) 
measured the threshold of a narrow- 
band noise in the presence of two 
tones, one on either side of the noise. 
Increasing the difference in fre- 
quency, AF, between the two tones 
left the masked threshold for the 
noise unchanged until a critical AF 
was reached, whereupon the thresh- 
old fell sharply and, in general, con- 
tinued to fall as AF was increased 
further. The two subjects who served 
in this experiment showed the same 
drop in threshold at approximately 
the same AF for a given center fre- 
quency regardless of the SPL of the 
masking tones. The critical-band 
curve of Figure 1 gives the approxi- 
mate values of AF at which the mask- 
ing effect of two tones is sharply re- 
duced. 


* Gissler (1954) measured a critical band of 
165 cps at 1000 cps. Garner (1947) had writ- 
ten earlier that “the best estimate . . . is that 
a band of frequencies no wider than 175 cps 
around 1000 cps is necessary if temporal in- 
tegration of acoustic energy is to be perfect” 
(p. 813). His estimate was based upon meas- 
urements of the threshold changes for a wide- 

nd noise, an unfiltered 1000-cycle tone, and 
eh 1000-cycle tone as a function of 

ndwidth which was varied by varying the 
duration of the signal, 
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Sensitivity to Phase 


The critical band is also relevant 
to phase sensitivity, measured by a 
comparison between the car's ability 
to detect amplitude modulation 
(AM) and its ability to detect fre- 
quency modulation (FM). This pro- 
cedure requires some explanation. 

When the amplitude of a tone is 
modulated—i.e., alternately in- 
creased and decreased—a three-tone 
complex is produced with the original 
tone (the “carrier”) at the center of 
the complex and a tone on either 
side (side bands). When the frequency 
of a tone is modulated over a narrow 
range, a three-tone complex is also 
produced.* The only important dif- 
ference between the three-tone com- 
plex that is produced under AM and 
the complex that is produced under 
FM concerns the phase relations 
among the components. Conse- 
quently, any difference in the ear's 
sensitivity to AM and FM would 
presumably depend upon these phase 
relations. 

Zwicker (1952) found, indeed, that 
in order for a subject to just hear a 
difference between a modulated and 
a pure, unmodulated tone, a smaller 
amount of AM is required than FM. 
The ear is more sensitive to AM than 
to FM, however, only at low rates of 
modulation. As the rate of modula- 
tion is increased, the difference in 
sensitivity to AM and FM gradually 
disap How do these results 
pertain to the critical band? The rate 
at which a tone is modulated deter- 
mines the frequency separation, AF, 
between the side bands of the three- 
tone complex produced under the 
modulation. It turns out that the 
rate of modulation at which AM and 


4 For a lucid discussion of the intricacies of 
modulation, consult Stevens and Davis (1938, 
pp. 225-231). 
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Fic. 2. The loudness level of a band of 
noise centered at 1000 cps measured as a 
function of the width of the band. (The 
parameter is the effective SPL of the noise. 
The dashed line shows that the bandwidth at 
which loudness begins to increase is the same 
at all the levels tested. This figure is adapted 
from the book, Das Ohr als Nachrichtenem- 
pfänger, by Feldtkeller and Zwicker—1956, 
p. 82.) (Adapted with permission of S. Hirzel 
Verlag) 


FM become equally difficult to de- 
tect corresponds to values of AF that 
are essentially the same as the cri- 
tical-band values given in Figure 1. 
Zwicker’s investigation showed, 
moreover, that the critical band de- 
termined by phase sensitivity is in- 
dependent of the SPL of the modu- 
lated tone and varies only as a func- 
tion of the frequency of the “carrier” 
which lies, of course, at the center of 
the band. 

Since the complexes produced un- 
der AM and those produced under 
FM differ primarily with respect to 
phase relations, the ear may be able 
to detect AM more easily than FM 
at low rates of modulation because it 
is more sensitive to the kind of phase 
relations that occur under AM. The 
ear seems to be sensitive to the 
phase relations, however, only when 
the AF of the complex is less than a 
critical band. When AF is greater 
than a critical band, there is no dif- 
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ference in sensitivity to AM and _ 
FM, implying that, beyond the ` 
critical band, the phase relations — 
within the complex no longer serve — 
as a significant cue in the detection 
of modulation. 


Loudness of Complex Sounds 


The critical band has been meas- ~ 
ured most thoroughly in studies of — 
the loudness of complex sounds as a 
function of bandwidth. Zwicker and 
Feldtkeller (1955) demonstrated that = 
the loudness of a white noise is in- 
dependent of bandwidth until the 
critical band is exceeded, v.creupon 


the loudness begins to ‘ncrease. 
Their procedure was straightfor- 
ward. They presented a band of 


filtered white noise and a comparison 
tone alternately through a single ear- 
phone. The subject adjusted the in- — 
tensity of the tone until the tone and 
the noise sounded equally loud. The — 
overall SPL of the noise was held 
constant; only the bandwidth was 
varied from judgment to judgment. 
(Zwicker and Feldtkeller did not re- 
port the number of subjects or the 
amount of variability; probably only 
a few, well-trained subjects were 
used and the variability was small 

Figure 2 shows what happens to the 
loudness of a band of noise when its 
width is increased. These curves arè 
for bands centered at 1000 cea 

which was the geometric mean of 
the two half-power points. At all 
the SPLs tested, from 30 to 80 db., 
the loudness of the noise remains 
constant and the curve is flat up to & 
bandwidth of about 160 cps, where 
upon the loudness begins to increase 
Within the critical band, the noises 
are as loud as a tone of equal inten- — 
sity, having the same frequency a$ — 
the center of the band. Functions 
similar in shape to those in Figure 4 
were generated for bands centered at 
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500, 2000, and 4000 cps. The band- 
width at which loudness begins to 
increase defines the critical band for 
loudness, which was found to have 
approximately the same values as 
had been measured for threshold, 
two-tone masking, and phase sensi- 
tivity (see Figure 1). 

Zwicker and Feldtkeller studied 
continuous spectra, i.e., noises that 
have energy at every frequency be- 
tween the cutoff points. Bauch 
(1956) studied line spectra, i.e., 
sounds that have energy at two or 
more separate frequencies. He 
measured the loudness of three-tone 
complexes, produced by amplitude 
modulation, as a function of the 
difference, AF, in cps between the 
lowest and highest components of the 
complex. Bauch obtained the same 
results with three-tone complexes 
centered at various frequencies as 
Zwicker and Feldtkeller had ob- 
tained with bands of noise. For 


» values of AF less than a critical band, 


loudness is constant except when 
AF is so small that beats are heard. 
The loudness begins to increase as a 
function of AF only when AF exceeds 
the critical band. 

At the time that the critical band 
was being mapped out in Germany 
at the Technischen Hochschule Stutt- 
gart (Bauch, 1956; Gissler, 1954; 
Zwicker, 1952, 1954; Zwicker & 
Feldtkeller, 1955) some of us at the 
Psycho-Acoustic Laboratory at Har- 
vard were puzzled by our failure to 
find an increase in the loudness of a 
four-tone complex as a function of 
AF. We had assumed that loudness 
summation begins as soon as AF is 
increased. We were, however, study- 
ing four-tone complexes whose AFs 
were smaller than a critical band. 
When reports of the critical band 
came from Germany, our results be- 
gan to make sense and, indeed, agreed 
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Fic. 3. The dependence of the loudness of a 
four-tone complex, centered at 1000 cps, on 
spacing and level. (Each point represents the 
median of two judgments by each of 10 listen- 
ers. The symbol T means the comparison tone 
was adjusted; C means the complex was ad- 
justed. This figure is from Zwicker, Flottorp 
and Stevens—1957, p. 550.) (Reproduced 
with permission of the Journal of the Acoustical 


Society of America) 


well with those being obtained across 
the sea. The experiments were con- 
tinued at Harvard by S. S. Stevens 
with G. Flottorp from Norway and 
E. Zwicker from Germany (Zwicker 
et al., 1957). Four-tone complexes 
and bands of white noise, at various 
center frequencies and various SPLs, 
were studied. In these experiments, 
16 to 22 untrained subjects some- 
times adjusted the complex sound 
and sometimes adjusted the compari- 
son until the two were equally loud. 
Figure 3 shows a typical set of results, 
those for four-tonecomplexes centered 
at 1000 cps. Each point is the median 
of 20 loudness matches. Although the 
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subjects were somewhat variable in 
their judgments, the medians are 
orderly and the lines through the 
data show a break at approximately 
the same value of AF that had been 
measured in Germany. The critical 
band made the transatlantic journey 
safely and invariantly. 

Another investigation carried out 
at Harvard (Scharf, 1959a) showed 
that at low levels, between 5 and 35 
db. above threshold, where the loud- 
ness of a complex sound increases 
more slowly with bandwidth than at 
higher levels, the critical band must 
be exceeded before loudness begins 
to change as a function of band- 
width. 

Niese (1960), in Dresden, has also 
studied loudness summation and the 
critical band. He presented the 
sound stimuli not only through ear- 
phones (as in all the previous experi- 
ments) but also through a loud- 
speaker in a free field, i.e., in an 
anechoic room where sounds are al- 
most completely absorbed by spe- 
cially constructed walls. The results 
for free-field listening are similar to 
those for earphone listening; the 
loudness of a band of white noise be- 
gins to increase with bandwidth 
when the critical band is exceeded. 
Niese found, however, that the loud- 
ness did not continue to increase in- 
definitely with bandwidth, but in- 
creased about 8 db. and then re- 
mained constant for bandwidths 
greater than 1000 to 5000 cps de- 
pending upon the center frequency. 
It may be that the loudness did not 
increase further because the avail- 
able energy was spread to very low 
and very high frequencies which con- 
tributed little to the total loudness. 

In other experiments, Niese (1960) 
tested the assumption that loudness 
summation is a peripheral process 
occurring independently in each ear. 
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In one procedure, a band of white 
noise was divided in half at its center 
frequency; the upper half was pre- 
sented through an earphone to one 
ear and the lower half to the other 
ear. The loudness of the noise in 
both ears did not begin to increase 
with bandwidth until the overall 
width exceeded a value approxi- 
mately twice the critical band, i.e., 
until the noise in each ear was wider 
than a single critical band. In a sec- 
ond procedure, two narrow bands, 
each 100 cycles wide, were first pre- 
sented together to one ear and later 
separately to each ear. When pre- 
sented together to a single ear, the 
loudness of the two bands increased 
with the frequency separation be- 
tween them. When, on the other 
hand, one band was presented to 
each ear, the loudness did not in- 
crease with the frequency separa- 
tion, no matter how great it was. 
The loudness did not increase be- 
cause the band of noise presented to 
each ear was never wider than a 
critical band; it was always 100 
cycles wide. Loudness summation 
thus seems to depend only upon the 
distribution of energy in one ear, 
suggesting that summation takes 
place not at some higher level in the 
auditory system where nerve im- 
pulses from the two ears join, but at 
the periphery, probably in the inner 
ear. 

Still another aspect of loudness 
summation has been recently in- 
vestigated (Scharf, 1959b). The re- 
sults indicate that the loudness of 4 
complex sound remains essentially 
unchanged when only the number of 
components in the complex is varied. 
The loudness of the complex increases 
with AF when AF is greater than a 
critical band, but at any given value 
of AF the loudness is approximately 
invariant with the number of com- 
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ponents, provided the overall sound 
pressure remains invariant. 

The several experiments in loud- 
ness summation, along with those on 
threshold, two-tone masking, and 
phase sensitivity provide a firm body 
of evidence for the critical band. 
There remains, however, the ques- 
tion of the role of the critical band in 
the masking of pure tones by white 
noise. 


MASKING BANDS 


Although the empirical measures 
of the critical band are quite recent, 
the concept of a critical band was 
expounded some 20 years ago by 
Fletcher (1940) when he hypoth- 
esized that: (a) a pure tone that is 
masked by a white noise is in effect 
masked only by a narrow band of 
frequencies surrounding the tone, 
and (b) the intensity of the part of 
the band that does the masking is 
equal to the intensity of the tone. 

Fletcher (1940) presented some 
preliminary experimental results to 
support his thesis, but the projected 
full-scale experiment has apparently 
not been reported. Nonetheless the 
concept of a critical band has be- 
come important in theories about 
masking. Moreover, the acceptance 
of Fletcher’s hypotheses permits the 
calculation of values for the masking 
band from the measurement of the 
masking of pure tones by white noise 
(Hawkins & Stevens, 1950). The 
calculated values for the masking 
band turn out to be about two-and- 
One-half times smaller than the em- 
pirical values for the critical band, as 
measured in experiments on loud- 
Ness, two-tone masking, etc. This 
discrepancy, however, may be re- 
Solved either by a modification of 
Fletcher’s second hypothesis, or, bet- 
ter, by direct measurements of the 
masking band. Let us turn first to 
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the indirect measurements of the 
masking band and the assumptions 
underlying them. 


Indirect Measures of the Masking 
Band 


If both Fletcher’s hypotheses 
about the existence of a masking 
band and about the equality of the 
intensities of the tone and noise are 
accepted, it is possible to calculate 
the size of the masking band from the 
masked thresholds for pure tones in 
white noise. Only one empirical 
operation is necessary. The thresh- 
old for a tone is measured in the 
presence of a white noise. From the 
intensity of the just-masked tone 
and the intensity of the masking 
noise, it is fairly simple to calculate 
how large a band within the noise 
contains the same energy as the tone. 
The width of this band is, by defini- 
tion, the masking band. Its width is 
calculated by taking the ratio of the 
intensity of the tone to the intensity 
per cycle of the noise. (Since a white 
noise contains all audible frequencies 
at equal intensity, the intensity per 
cycle is uniform throughout.) For 
example, Hawkins and Stevens (1950) 
found that the ratio between the in- 
tensity of a 1000-cycle tone (at its 
masked threshold) and the intensity 
per cycle of the masking noise is 
63:1 or 18 db. Since the intensity in 
each one-cycle band of noise is 1 /63 
the intensity of the masked tone, a 
band of frequencies 63 cps wide will 
have an overall intensity equal to 
that of the tone. Therefore, accord- 
ing to the second hypothesis, the 
masking band is taken to be 63 cps 
wide for a tone of 1000 cps. Values 
for the masking band that are calcu- 
lated in the foregoing manner will be 
called “critical ratios,” as suggested 
by S. S. Stevens (see Zwicker et al., 


1957). 
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Hawkins and Stevens measured 
the masked thresholds at many fre- 
quencies from 100 to 9000 cps in the 
presence of white noise at levels from 
20 to 90 db. They found that the 
ratio of the intensity of a just-masked 
tone to the intensity per cycle of the 
masking noise remains constant at 
all noise levels except the very lowest. 
In other words, the critical ratio does 
not change as a function of the level 
of the masking noise. The critical 
ratio is, however, different at dif- 
ferent center frequencies, as shown in 
Figure 1. The results of these experi- 
ments agree with similar measure- 
ments that Fletcher and Munson 
(1937) had made of the critical ratio 
for tones masked by a uniform mask- 
ing noise. 

Bilger and Hirsh (1956) also calcu- 
lated critical ratios from masking 
data obtained with bands of white 
noise 250 mels wide. (The mel is a 
unit of pitch.) The substitution of a 
250-mel band, which is about five 
times as wide as the critical ratios 
measured by Hawkins and Stevens, 
is consistent with the assumption 
that the energy outside the masking 
band contributes nothing to the 
masking effect. If this, Fletcher's 
fundamental assumption, is true the 
critical ratio should be the same in 
both experiments. The results of the 
two independent experiments were, 
in fact, in close agreement. 

In all these experiments the calcu- 
lated value of the critical ratio de- 
pends upon the measured value of 
the masked threshold which may not 
be very reliable. Blackwell (1953) 
has shown, for example, that the 
value obtained for a threshold de- 
pends upon the psychophysical meth- 

od employed in its measurement. 
The congruence of the results of the 
several experiments tends, however, 
to negate this criticism. Using the 
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reported threshold measurements, we 
can modify Fletcher's second asump- 
tion so that the masking band has 
the same values as the critical band. 

Instead of assuming, quite arbi- 
trarily, that the intensities of the 
masked tone and of the masking 
band are equal, we can just as well 
assume that the intensity of the 
masking band is two-and-one-half 
times as great as that of the masked 
tone. Over most of the frequency 
range, this simple modification of 
Fletcher's second hypothesis yields 
values for the masking band that are 
equal to the measured values of the 
critical band. A simple modification 
succeeds because, as Figure 1 shows, 
except for very low frequencies, the 
critical band and the critical ratio 
are the same functions of center fre- 
quency. Since this new assumption is 
ad hoc and arbitrary, it will probably 
have little appeal. What we need is a 
more direct and straightforward 
type of evidence of the existence of 
the masking band. 


Direct Measures of the Masking Band 


The direct measurement of the 
masking band requires the sampling 
of the masked threshold for tones in 
the presence of bands of noise of dif- 
ferent widths. If a masking band 
exists, the tone should become more 
difficult to detect as the bandwidth 
of the noise is increased up to the 
value of the masking band. Increasing 
the bandwith beyond the masking 
band should not raise the threshold 
for the tone any further. (In such 
experiments, energy is added to the 
noise as the bandwidth is increased, 
unlike experiments on loudness sum- 
mation where a constant amount of 
noise energy is spread over a wider 
frequency range in order to increase 
the bandwidth.) Direct measure- 
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ments of this type have been reported 
by Fletcher (1940), Hamilton (1957), 
and Schafer, Gales, Shewmaker, and 
Thompson (1950). Some of the re- 
cent experiments suggest that the 
masking band is larger than the crit- 
ical ratio and may approximate the 
critical band as measured for other 
auditory phenomena. 

In the first and most famous of 
these experiments, Fletcher (1940) 
measured the threshold for tones of 
seven different frequencies ranging 
from 125 to 8000 cps in the presence 
of bands of noise of various widths. 
No information about subjects, ap- 
paratus, or procedure was given. The 
results of this admittedly preliminary 
experiment provided some evidence 
for the masking-band hypothesis; 
the masked threshold tended first to 
increase and then to remain constant 
as the bandwidth of the masking 
noise was increased. The results 
seemed also to justify the assump- 
tion that, within the masking band, 
the intensity of the noise and the just- 
masked tone are equal: a band of 
noise, 30 cps wide, just masked a 
tone lying at its center frequency and 
having the same intensity. Precise 
determinations of the width of the 
masking band were not possible, 
however, because the data were 
highly variable and only a few band- 
widths had been sampled. Of band- 
widths having values in the vicinity 
of those for the masking band, only 
one, 200 cps wide, was adequately 
sampled. Nevertheless, relying heav- 
ily upon the assumption that the 
masking band and the just-masked 
tone are equally intense and upon 
the threshold measurements made in 
the presence of wide-band noise, 
Fletcher suggested values for the 
Width of the masking band. These 
Values, which Fletcher cautioned 
might be wrong by a factor of two, 


turned out to be pos y 
same as the critical ratios 
in 1950 by Hawkins and s (i 
Figure 1). This similagity is not 
prising, for the rales recommended 
by Fletcher were, in effect, 
ratios. While suggestive, Fletcher's 
results provided neither conclusive 
support for his hypotheses nor a solid 
basis for the direct measurement of 
the width of the masking band. 

Hamilton's (1957) more recent 
work provides a direct and precise 
measure of the masking band. Meas- 
uring the masked threshold for an 
800-cycle tone in the presence of 
bands of noise that were centered at 
800 cps and that varied in width 
from 19 to 1100 cps, he found that up 
to a bandwidth of 145 cps the masked 
threshold increased as the width of 
the masking noise increased. Beyond 
145 cps the threshold remained con- 
stant, indicating that the masking 
band at 800 cps is 145 cps wide. The 
critical band measured in four other 
types of experiment is also about 145 
cycles wide at 800 cps (see Figure 1). 
This coincidence of values is remark- 
able in view of the variability in- 
herent in these experiments and 
Hamilton’s apparent unfamiliarity 
with the other measures of the crit- 
ical band. i 

A second important result in Ham- 
ilton’s experiment shows that the 
difference (the signal/noise ratio) be- 
tween the intensity of the 
tone at its masked threshold and the 
overall intensity of the masking noise 
is not constant, even when the width 
of the masking noise is less than a 
critical band. The signal/noise ratio 
decreases from about 0 db. for a band 
30 cps wide to almost —4 db. for the 
critical width of 145 cps. (Hamilton 
reports similar results by Bauman, 
Dieter, Lieberman, and Finney, 
1953.) Fletcher had also found that 
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a band 30 cps wide just masks a tone 
at its center when the signal/noise 
ratio is 0 db., i.e., when the intensi- 
ties of the tone and the noise are 
equal. This equality at a width of 30 
cps suggested that at the critical band- 
width also, the tone and noise have 
the same intensity. Hamilton showed, 
however, that at the critical band- 
width the signal/noise ratio is not 
the same as at 30 cps. Accordingly, 
Fletcher's threshold measurements 
for a tone in a 30-cps-wide band of 
noise probably lend no support to the 
critical-ratio hypothesis; they are, 
however, consistent with critical- 
band values for the masking band. 
Although Hamilton studied only 
one frequency, his results provide 
valuable information because they 
are orderly and self-consistent. Prob- 
ably the use of a forced-choice proce- 
dure with well-trained subjects con- 
tributed to the preciseness of the re- 
sults. In contrast, Schafer et al. 
(1950) report a more extensive ex- 
periment whose results are difficult to 
interpret. They measured the masked 
threshold for tones in three frequency 
regions as a function of the band- 
width of the surrounding noise. In- 
stead of the usual white noise, they 
used bands of synthetic noise com- 
posed of tones one cycle apart. Pre- 
liminary experiments indicated no 
important difference between these 
bands of synthetic noise and bands 
of white noise. Twenty-five subjects 
served in the main experiments in 
which a random method of limits 
was used to measure the masked 
threshold for a tone that had been 
matched in pitch to the masking 
noise. The results suggest the pres- 
ence of a masking band, but since no 
sharp change in the masked thresh- 
old was observed as the bandwidth 
was increased, the width of the mask- 
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ing band can be estimated only ap- 
proximately. In the three frequency 
regions that were tested, the results 
suggest a masking band that is larger 
than that given by the critical ratio, 
and one that could well be as large as 
a critical band. 

Schafer et al. (1950) interpreted 
their results to indicate no change in 
the signal/noise ratio within the 
masking band. Hamilton (1957), on 
the other hand, did find a sma!i but 
consistent change in the signal, .oise 
ratio within the masking band. Sace, 
however, Schafer’s observers were 
too variable to permit a precise meas- 
urement of changes in the signal/ 
noise ratio, the small difference be- 
tween the results of the two experi- 
ments is probably not significant. 
There is also some question about 
what Schafer et al. measured. Their 
use of a tone “matched in pitch to the 
masking noise” may account for some 
of the disparity between their results 
and Hamilton's. 

These two experiments, by Hamil- 
ton and by Schafer, seem to be the 
only direct tests of the masking-band 
hypothesis since Fletcher's original 
attempt. One related experiment 
(Webster, Miller, Thompson, & Dav- 
enport, 1952) deserves mention. A 
white noise with octave gaps was 
used to mask tones at frequencies 
corresponding to those in and near 
the gaps. The measurements of the 
masked thresholds seem to suggest 
that Fletcher’s values for the mask- 
ing bands are too small. 

The lack of extensive tests of the 
masking-band hypothesis prevents a 
definitive statement about the valid- 
ity of the hypothesis, and even less 
may be said about the size of the 
bands. Nevertheless the net impres- 
sion one obtains from the literature is 
that a masking band does exist and 
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that it may well be the same width 
as the critical band.’ 


OTHER CORRELATES OF THE 
CRITICAL BAND 


We have seen that the function 
relating the critical band to the fre- 
quency at the center of the band is 
derived from four types of experi- 
ment and that the width of the mask- 
ing band may be the same as that of 
the critical band. Of interest, also, 
is the resemblance that the critical- 
band function bears to several other 
functions of frequency: the place of 
maximal displacement on the basilar 
membrane, the difference limen for 
frequency, and the mel scale of sub- 
jective pitch. These similarities have 
been noted elsewhere with respect to 
the critical band (Zwicker et al., 
1957) and also with respect to the 
critical ratio (Fletcher, 1940, 1953; 
von Békésy & Rosenblith, 1951). 

Perhaps the most interesting fact 
about the critical band is that it 
seems to correspond to a constant 
distance of about 1.3 millimeters 
along the basilar membrane. The 
first line in Figure 4 is a slightly 
idealized schematization of the fre- 
quency representation on the basilar 
membrane. The second line shows 
that 24 or 25 critical bands may be 
represented by equal-sized segments 


5Since the preparation of this article, 
Greenwood (1960) has reported an extensive 
study that confirms the suggestion that there 
is a masking band and that it is the same size 
as the critical band. Greenwood measured the 
threshold for pure tones presented in bands of 
white noise. He varied not only the width of 
the bands of noise around a given center fre- 
quency, but also the sensation level of the 
noise and the frequency of the masked tone. 
Investigating bands of noise in five regions of 
the spectrum, he found consistent evidence for 
the existence of a fairly sharp masking band 
eel, the same size as the critical 


[e] 500 1000 2000 2500 3000 
2 aa LA E 
200 400 600 @00 


Fic. 4. Representation on the basilar mem- 
brane of (1) frequency in kilocycles, (2) criti- 
cal bands, (3) pitch (Stevens & Volkmann, 
1940), (4) just noticeable differences for fre- 
quency, the fifth line marks off distance in 
millimeters on the basilar membrane. (This 
figure is adapted from the book, Das Ohr als 
Nachrichtenempfanger, by Feldtkeller and 
Zwicker—1956, p. 60.) (Adapted with per- 
mission of S. Hirzel Verlag) 


of the membrane. The boundaries of 
the critical bands are not fixed, of 
course, since a critical band may take 
shape around any frequency. 

The mel and the jnd for frequency 
also correspond to constant distances 
on the basilar membrane (see the 
third and fourth lines in Figure 4). 
It is, therefore, not surprising that 
the critical-band function looks very 
much like the functions for the mel 
scale and the jnd scale. Measured in 
mels, the size of the critical band 
varies little, from 100 mels at low 
center frequencies to 180 mels at 
high frequencies. The mel scale is 
not accurate enough, however, to 
distinguish 100 from 180 mels at op- 
posite ends of the scale, so that the 
pitch range of the critical band may, 
in fact, be fairly constant, perhaps 
approximating 150 mels. 

The width of the critical band on 
the basilar membrane is determined 
from the map relating the frequency 
of pure tones to the position of maxi- 
mal stimulation on the membrane 
(von Békésy, 1949). Although no di- 
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rect physiological measures of the 
critical band have been reported, the 
fact that throughout the frequency 
spectrum the critical band corre- 
sponds to a constant length of the 
basilar membrane lends support to 
the notion that this band may be re- 
garded as a fundamental unit of hear- 
ing. 


FUTURE PROSPECTS 


With the experimental basis for 
the critical band reasonably well es- 
tablished, investigators are beginning 
to consider the relevance of the crit- 
ical: band to the loudness of pure 
tones, to temporal integration, to 
deafness, to speech perception, and 
to other auditory processes. 

Zwicker (1956, 1958), for example, 
has argued that the loudness of an in- 
tense pure tone is a composite loud- 
ness because the displacement of the 
basilar membrane is spread over 
many critical bands. Zwicker as- 
sumes that the “loudnesses” corre- 
sponding to these critical bands sum- 
mate to give the total loudness of the 
tone. Similar assumptions underlie 
Zwicker’s (1958) system for the ob- 
jective calculation of the loudness of 
a complex noise. The loudness of a 
noise is assumed to equal the sum of 
the individual loudnesses of the com- 
ponent critical bands after allowance 


for mutual masking effects among the 
bands. 

Other investigators are studying 
temporal integration for short tone 
pulses (cf. Plomp & Bouman, 1959). 
Since short tone pulses are in effect 
multicomponent complexes whose 
bandwidth varies with time, the in- 
tegration of energy at threshold 
would be expected to occur within 
the critical band. 

Clinical use of the critical band has 
been attempted by deBoer (1960) in 
the diagnosis of hearing loss. His re- 
sults suggest that the critical-band 
mechanism may be disturbed in cer- 
tain kinds of deafness. The related 
problem of individual differences for 
the critical band has remained essen- 
tially uninvestigated except for some 
observations by Niese (1960) and 
indications from earlier data (e.g., 
Gissler, 1954) that the size of the 
critical band may vary from person 
to person, just as thresholds do. 

Although no answers have yet 
come forth, phoneticists are begin- 
ning to ask about the role of the crit- 
ical band in the perception of speech. 
Musicians may soon add their 
problems. The quest has begun in 
earnest. Now that a fundamental 
unit of hearing has been identified, 
it remains to discover its role in all 
the many processes called hearing. 
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PERSEVERATIVE NEURAL PROCESSES AND CONSOLIDA- 
TION OF THE MEMORY TRACE! 


STEPHEN E. GLICKMAN 
Northwestern University 


For a short period between the 
turn of the century and the first 
world war, theories of perseveration 
figured prominently in attempts to 
understand many of the newly dis- 
covered phenomena of learning and 
forgetting. Although the exact lines 
of speculation varied from one writer 
to the next, in general, a neural 
fixation process was assumed to con- 
tinue after the organism was no 
longer confronted with the stimuli to 
be learned. This fixation process was 
deemed crucial to efficient retention 
and interference with perseveration 
was presumed to have an adverse ef- 
fect on an organism’s ability to re- 
member stimuli to which it had been 
exposed. 

The first clear statement of such a 
consolidation theory is generally at- 
tributed to Müller and Pilzecker 
(1900). In order to account for the 
existence of retroactive inhibition, 
Müller and Pilzecker postulated the 
existence of a neural perseverative 
process, subject to external inter- 
ference and requisite to the consolida- 
tion of the memory trace for recently 
acquired material. Although knowl- 
edge of the physiology of brain func- 
tion was still quite limited, Müller 


1 The preparation of this paper was made 
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the author at the Department of Physiology 
and Biophysics, University of Washington 
Medical School, during the summer of 1959, 
and supported by Grant 2-B5082 from the 
National Institute of Neurological Diseases 
and Blindness. The author is indebted to 
C. P. Duncan, S. M. Feldman, T. Kennedy, 
and D. Kimura for critically reading the 
manuscript. 
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and Pilzecker nevertheless attempted 
to be as precise as possible regarding 
the neural locus of perseveration. 
They rejected the notion that per- 
severation was in any way analogous 
to sense organ processes such as those 
believed to underlie the negative 
afterimage, on the grounds that these 
sensory processes were of too short 
duration. On the other hand, the per- 
severation which Müller and Pilz- 
ecker observed did appear to be sim- 
ilar to the repetitious or stereotyped 
behavior resulting from diseases of 
the subcortical motor centers. It was 
with these latter structures that 
Müller and Pilzecker associated per- 
severative activity. 

Numerous other psychologists were 
concerned with perseveration theory 
during the early 1900s. Among 
these, DeCamp (1915) advanced 
what was probably the most detailed 
piece of pseudoneurological specula- 
tion: 

From the neurological standpoint, in the 
learning of a series of syllables, we may as- 
sume that a certain group of synapses, nerve 
cells, nerve paths, centres, etc., are involved. 
Immediately after the learning process the 
after-discharge continues for a short time, 
tending to set associations between just 
learned syllables. Any mental activity en- 
gaged in during this after-discharge, involv- 
ing or partially involving the same neuro- 
logical group, tends, more or less, to block the 


after-discharge, and give rise to retroactive 
inhibition (p. 68). 


Some years previous, Sherrington 
(1906) had described the phenom- 
enon of afterdischarge in spinal re- 
flexes and discussed the blockage of 
such discharges by subsequent stim- 
uli. It is interesting to note that 
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this provided the theoretical model 
for DeCamp's view of perseverative 
processes in much the same manner 
as Sherringtonian physiology gen- 
erally shaped the psychologists’ con- 
ception of neural activity (see Hebb, 
1951). 

As a behavioral theory of retroac- 
tive inhibition, however, persevera- 
tion theory met with many difficul- 
ties, and was eventually replaced by 
the current concepts of associative 
interference (McGeoch & Irion, 1952; 
Osgood, 1953), although it continued 
to receive some limited support as 
a possible factor in forgetting (Wood- 
worth, 1938). Ultimately, a per- 
severation theory, erected on the 
basis of inferences from behavior, was 
no longer viable once the behavioral 
observations were either shown to be 
false or explained more parsimoni- 
ously by other hypotheses. The re- 
juvenation of this theory awaited di- 
rect support from neurology. 

Lashley (1918) once made the fol- 
lowing comment on perseveration 
theory: 


If there is a gradual strengthening of associa- 
tions during periods of nonpractice, there is 
implied a continuation of chemical changes 
within the nerve cells, initiated by the passage 
of a neural impulse through new channels and 
persisting for hours or even days without the 
influence of continued impulses. The experi- 
mental evidence upon which the belief in a 
gradual fixation of associations is based is far 
from convincing... it all can be explained 
equally well by other hypotheses and, in view 
of the extreme importance of the point for 
physiological explanation, we should be care- 
ful not to accept the assumption of a gradual 
setting of new functional connections until 
some real evidence is advanced to support it 
(pp. 363-364), 


This healthy skepticism was cer- 
tainly justified, although even at the 
time some physiological evidence was 
available to buttress perseveration 
theory. 
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RETROGRADE AMNESIA 


Shortly after the publication of 
Müller and Pilzecker’s work, Mc- 
Dougall (1901) called attention to 
the applicability of their persevera- 
tion theory to the explanation of 
retrograde amnesia (RA) resulting 
from cerebral trauma. However, 
Burnham (1904) was apparently the 
first individual to extensively discuss 
the relationship between RA and per- 
severative “consolidation” amnesia. 
Burnham’s paper involved an anal- 
ysis of two cases of retrograde am- 
nesia. Both of these subjects had sus- 
tained head injuries as the result of 
accidents and in both cases there was 
a loss of memory for events occurring 
during the period preceding the ac- 
cident. As the result of his studies of 
these cases and of others cited by 
Ribot (1892), Burnham suggested 
that 
The fixing of an impression depends upon a 
physiological process. It takes time for an 
impression to become so fixed that it can 
be reproduced after a long interval; for it 
to become part of the permanent store of 
memory considerable time may be neces- 
sary. This we may suppose is not merely a 
process of making a permanent impression up- 
on the nerve cells, but also a process of associ- 
ation, of organization of the new impressions 
with the old ones (p. 392). 


He further speculated that: (a) the 
time required for this fixation process 
may vary with individuals and con- 
ditions; (b) shock produces its effects 
by arresting the fixation process in 
the nervous tissue; (c) such shock 
may be produced by great fatigue, 
excitement, unconsciousness, or nar- 
cotics; (d) RA is not all-or-none and 
the extent of the amnesia is relative 
to the amount of time elapsing befor 
the fixation process is interrupted 
and finally (e) that automatic ac 
tivity is an important factor in fixins 
impressions although it may no 
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necessarily be directly observable in 
terms of movements. > 

These remarkable observations 
would appear to have been borne out 
by recent experiments in nearly every 
case and we can now advance these 
propositions with much more con- 
fidence. 

During the first 4 decades of this 
century, the phenomenon of RA con- 
stituted the only direct physiological 
evidence for the existence of a neural 
fixation process. Early references to 
it are to be found in Ballard (1913), 
Pillsbury (1913), DeCamp (1915), 
and others. Although a complete re- 
view of this literature is beyond the 
scope of the present paper, it is per- 
haps worthwhile to examine the re- 
sults of a comprehensive study by 
Russell and Nathan (1946). Ina sur- 
vey of 1,029 cases of head injury, only 
133 were found to have experienced 
no RA whatsoever. Seven hundred 
and seven reported amnesia for 
events occurring from several sec- 
onds to 30 minutes preceding the 
injury, while 133 reported RA of 
more than 30-minutes duration. Rec- 
ords were unavailable with 56 pa- 
tients in the sample. Russell and 
Nathan noted that the duration of 
RA is “in most cases a few moments 
only.” Since the use of barbiturate 
hypnosis reduced the period of RA in 
only 6 of 40 cases, and produced no 
data suggestive of hysterical repres- 
sions, the authors conclude that loss 
of the material is due to a blocked 
perseveration process: 

It seems that the mere existence of the brain 
as a functioning organ must strengthen the 
roots of distant memories. The normal ac- 
tivity of the brain must steadily strengthen 
distant memories so that with the passage of 


time these become less vulnerable to the 
effects of head injury (p. 299).2 


2 Coons and Miller (1960) have recently 
called attention to the possibility of sampling 
artifacts confounding the consolidation inter- 
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Experimentally induced RA has 
produced the best evidence for the 
existence of a consolidation process 
since the results would be predictable 
from perseveration theory, while the 
primary competing theory, the asso- 
ciative interference theory, has no 
explanation to offer. We will therefore 
turn now to a review of the various 
experimental procedures used to in- 
duce RA and the results obtained. 


Electroconvulsive Shock 


The introduction of electroshock 
therapy in 1937 provided both the 
impetus and the technical apparatus 
for the laboratory study of RA. Im- 
mediately after its introduction many 
practitioners observed that electro- 
convulsive shock (ECS) produced a 
temporary postshock amnesia which 
eventually shortened to a genuine RA 
for events immediately preceding the 
shock treatment. Zubin and Barrera 
(1941) were the first investigators to 
subject these observations to sys- 
tematic study. They trained 10 pa- 
tients in a series of paired associate 
lists to a criterion of two consecutive 
correct repetitions. Learning occurred 
either in the morning or evening, 
while the retention tests were given 
during the subsequent afternoon. 
The same subjects were used in con- 
trol and experimental conditions, i.e., 
(a) with no shock intervening be- 
tween learning and the retention test, 
and (b) with an ECS interpolated 
after the morning learning session. 
With no intervening shock there 
were significant savings between 


pretation of clinical observations of retro- 
grade amnesia. Thus, they have pointed out 
that, if an injury produces a general decre- 
ment in memory, positive evidence for 
memory is more likely to be secured while 
examining the larger time samples involved 
in remote memories as compared to recent 
memories. 


Nat 


persee TIVE NEURAL PROCESSES 


learning and relearning, with an ine 
terpolated ECS there were no sinif- 
icant savings. A comparison between 
the effects of ECS on material learned 
the evening prior to shock with ma- 
terial learned the morning preceding 
shock indicated that recent material 
was more severely affected by ECS 
than remote material. The latter 
conclusion was based on rather small 
differences in savings scores and in- 
sufficient data are presented to per- 
mit adequate statistical evaluation. 
However, Flescher (1941), Williams 
(1950), and Cronholm and Molander 
(1958) have subsequently confirmed 
the substance of Zubin and Bar- 
rera’s assertions. The various in- 
vestigators using human subjects, al- 
though successfully employing ECS 
to interfere with memory, had not at- 
tempted to adequately define the 
time relations of such interference. 

This critically important step was 
taken by Duncan (1949). Duncan’s 
procedure involved training rats to 
avoid shock to the feet in a shuttle- 
box situation. A light, turned on 10 
seconds prior to grid shock, served as 
the conditioned stimulus (CS). The 
animals received one trial per day for 
18 days and records were kept of the 
number of successful avoidance re- 
sponses. Nine groups of animals were 
used in the study. Rats in eight of 
these groups received an ECS after 
each day’s trial, the trial-ECS in- 
terval ranging from 20 seconds to 14 
hours. In the remaining group, the 
ear clips used for delivering the ECS 
were applied following each day's 
trial but no current was passed. The 
results clearly indicated a deleterious 
effect of ECS on performance, the 
magnitude of the effect decreasing as 
the trial-ECS interval increased to 
produce a negatively accelerated 
curve. This general finding has since 
been confirmed by Ransmeier (1953), 
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Thompson and Dean (1955), and 
i Leukel (1957). All of the findings are 
‘compatible with the view that a 

single ECS can produce deficits in re- 

tention if delivered within 15 to 60 

minutes following a learning trial. 

Moreover, ECS induced immediately 

following the learning trial effectively 

obliterates nearly all retention of the 

“learned” response. The studies fol- 

lowing Duncan’s have employed dif- 

ferent learning tasks. Leukel (1957) 

and Ransmeier (1953) used maze 

learning situations with the ECSs 
being delivered at varying posttrial 
intervals. Thompson and his collab- 
orators have employed a visual dis- 
crimination learning task, with avoid- 
ance of grid shock as the motivating 
agent. In these latter studies 

(Thompson, 1957a; Thompson & 

Dean, 1955; Thompson & Penning- 

ton, 1957) a single ECS was ad- 

ministered at various intervals fol- 
lowing a series of massed trials in the 
apparatus. As a result of these ex- 
tensive experiments, it has been de- 
termined that ECS produces greater 
deficits in young than adult rats 

(Thompson, 1958a; Thompson, Har- 

avey, Pennington, Smith, Gannon, & 

Stockwell, 1958). Further, rats suf- 

fering from anoxia induced brain 

damage (Pennington, 1958), show 
greater deficits resulting from a single 

ECS than intact control animals. 

Both the findings with respect to age 

and those relating to brain damage 

are compatible with Thompson and 
his co-workers’ (1958) hypothesis 
that the extent of the deficit will be 
proportional to the number of cor- 
tical neurons available. Pennington 

(1958) has alternately suggested that 

the results obtained with brain dam- 

aged rats may be a function of a pro- 
longed perseveration process in these 


animals. i 
Thompson and Pennington (1957) 
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have also found that the memory 
rement produced by a single EC 
was less after spaced trials than after 
massed trials. This result was ex- 
pected from the point of view of a 
perseveration theory as a joint {ufic- 
tion of “firmer fixation of mem- 
ory trace owing to a longerduration 
of perseveration” and "the lessened 
intensity of perseveration at the end 
of training due to dissipation of per- 
severative activity.” 

Although the empirical result of 
interference with performance by 
postlearning ECS has not been ques- 
tioned, the interpretation of the re- 
sults is not quite as clear. The points 
to be discussed below actually raise 
questions of interpretation which 
apply not only to the ECS proce- 
dures but to other interpolated phys- 
iological procedures as well. 

1. The most serious alternative to 
a consolidation interpretation of the 
ECS results has been offered by 
Miller and Coons (1955). These in- 
Vestigators trained rats to eat in a 
runway and then shocked them while 
eating there. Avoidance was meas- 
ured by an increased latency of ap- 
proach to the eating place. ECSs 
were delivered to the animals at 
varying intervals after shock to the 
mouth. Miller and Coons reasoned 
that any aversive qualities of the 
ECSs might be expected to produce 
increased avoidance. On the other 
hand, if the ECS really interrupted 
consolidation, the subjects would 
show the opposite behavior, namely, 
approaching the food without hesita- 
tion. In this experiment no evidence 
was found for an attenuation of the 
avoidance response by the ECS, 

leading the authors to argue that the 
retardation in learning observed by 
Duncan (1949) was simply a function 
of placing the rat in a conflict situa- 
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i a More recent set of experi- 


+ Coons and Miller (1960) have 
Succeeded in opposing the conflict * 
and consolidation interpretations in a 
double grid-box situation similar to 
E used by Duncan. Here again, 
heir results indicate that ECS may 
not eliminate memory but merely in- 
dùce anxiety or conflict which in- 
hibits performance of the response in 
question. They further buttress their 
contentions regarding the fear in- 
ducing qualities of ECS with obser- 
vations on increased defecation, uri- 
nation, and weight loss in those ani- 
mals for whom the performance of an 
otherwise rewarded response is fol- 
lowed by an ECS. In both of their 
studies, the ECS apparently sum- 
mated with the grid shock to produce 
a result which significantly favored a 
conflict as opposed to a perseveration 
interpretation. Observations of Gal- 
linek (1956) suggest that analogous 
anxiety builds up in human beings 
during the course of electroshock 
therapy. Such an interpretation is 
logically possible for both the avoid- 
ance situations used by Duncan and 
by Thompson and Dean, and the 
maze learning situations used by 
Leukel and by Ransmeier and Ger- 
ard. The standard control for this 
has been to employ groups receiving 
painful but nonconvulsive shocks. In 
these cases (Duncan, 1949; Leukel, 
1957; Ransmeier & Gerard, 1954) it 
has been found that (a) the decre- 
ments produced by the painful but 
nonconvulsive shocks are not nearly 
as severe as those produced by ECS 
at comparable intervals, and (b) the 
posttrial interval during which pain- 
ful shocks produced their effect was 
always much shorter than that dur- 
ing which significant decrements 
could be produced by ECS. These 
latter control results would seem to 
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indicate that the ECS results are due 
to more than just conflict. It might 
be argued, however, that the ECSs 


' are sufficiently more painful or un- 


pleasant than the leg or tail shocks to 
account for the greater deficits pro- 
duced by the former. Reference to 
human subjects would suggest that 
this is not the case. Patients do not 
necessarily report pain as an ac- 
companiment of a properly delivered 
ECS (Stainbrook, 1948). In view of 
this, and considering that there is no 
consistent experimental evidence for 
punishment obliterating verbal ma- 
terial (Rapaport, 1942), it seems un- 
likely that the deficits observed in 
humans following ECS (or any cere- 
bral trauma) can be explained purely 
in conflict terms. Finally, in regard 
to the animal literature, it seems 
reasonable to point out that Miller 
and Coons delivered a series of ECSs 
on successive days, whereas Thomp- 
son and his collaborators eliminated 
a persistently rewarded response with 
a single ECS. 

In order to explain Thompson's re- 
sults in conflict terms, one would 
have to assume a delay of reinforce- 
ment gradient lasting at least 60 min- 
utes, and the build up of a significant 
amount of fear following a single 
ECS. Such assumptions, although 
possible, would not be easy to sup- 
port at the present time. Clearly, 
however, other workers should carry 
out experiments utilizing designs 
similar to those employed by Miller 
and Coons, i.e., opposing the con- 
solidationand conflict interpretations. 
The writer has used such a procedure 
in an experiment involving direct 
stimulation of the brain (Glickman, 
1958) and this could easily be 
adapted for ECS. Moreover, the one- 
trial learning situation employed in 
this latter experiment would permit 
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the use of a single ECS and enable 
an exceedingly accurate estimate 
of the trial-ECS interval.’ 

2. An alternate interpretation of 
the ECS results is also possible in 
those studies employing food reward. 
As Kohn (1951) and Berkun, Kessen, 
and Miller (1952) have shown, the 
rewarding properties of food are de- 
rived in part from stimulation of re- 
ceptors within the mouth, and in 
part from actions within the stom- 
ach. ECS delivered shortly after a 
learning trial might act to prevent 
the perception of the feedback from 
the stomach and thereby cut down on 
the reinforcing properties of the food. 
In view of the relatively minor con- 
tribution of these stomach receptors, 
particularly in the early stages of 
learning, such effects are probably 
insignificant in the studies of Rans- 
meier (1953) and Leukel (1957). 

3. A question has arisen about dis- 
tinguishing between ECS effects on 
a time-limited consolidation process 
and the more generalized memory 
deficits which have been observed to 
follow a series of ECSs (see Stain- 
brook, 1946, for review). In particu- 
lar, Worchel and Gentry (1950) have 
suggested that Duncan's (1949) find- 
ing of a limited period following 
learning when an ECS will be effec- 
tive is a result of his failure to used 
massed ECSs. On the basis of some 
T maze data of their own, Worchel 
and Gentry argue that Duncan might 


3 Since this article went to press, there have 
been two reports of experiments in which the 
conflict and consolidation interpretations of 
“forgetting” have been o] in o a 
learning situations. In both of these cases, in 
which the introduction of various C 
agents served as the interpolated procedure, 
the results favored a consolidation interpre- 
tation of the effects (Essman & Jarvik, 
1960; Pearlman, Sharpless, & Jarvik, 1961). 
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have considerably extended the dura- 
tion of time during which ECS would 
produce a deficit if if he had given a 
series of ECSs. Worchel and Gen- 
try's results do not contradict the 
general finding that it is easier to 
disrupt learning in the period im- 
mediately following exposure to the 
learning situation. However, at the 
present time, the ECS data are com- 
patible with the notion that the 
strengthening of memory traces is a 
continuous one throughout the life of 
the organism. For example, Brady 
(1951, 1952) has found evidence of 
Spontaneous growth in the strength 
of a conditioned emotional response 
during a period of 90 days. On the 
basis of current evidence, one might 
expect that the interval following a 
learning trial, during which time in- 
terference with retention can be 
produced, is a direct function of the 
degree of physiological severity of 
the interpolated procedure. 
Ultimately, there is probably some 
practical limit on the time interval 
between learning and ECS during 
which selective effects on retention 
can be produced. In addition, the 
effects of a series of ECSs delivered 
many hours or days after learning 
are often apparently tempor: 
(Stainbrook, 1946). Brady (1951) re- 
Ported that a series of ECSs sup- 
pressed a conditioned emotional re- 
sponse (CER) for a period of a 
month, although the habit reap- 
peared spontaneously at the end of 
that time. It has also been found 
that the effects of'a series of ECSs 
may be selective for emotional re- 
sponses (Geller, Sidman, & Brady, 
1955). On the basis of the accumu- 
lated data, it seems reasonable to 
suggest that ECS may affect per- 
formance in a number of ways in- 
cluding: (a) a temporary suppressor 
action involving those cerebral struc- 


STEPHEN E. GLICKMAN 


tures mediating pain or anxiety re- 
sponses (such a mode of action would 
explain the proactive effects noted 
by Poschel, 1957, and Carson, 1957, 
on avoidance conditioning) and (b) 
a direct action on the neural circuits 
involved in memory which, if the 
learning-ECS interval is brief enough 
and the treatment sufficiently severe, 
may permanently erase the effects of 
such learning. 


Anoxia 


Hayes (1953) demonstrated equiv- 
alent retroactive effects of anoxia 
and ECS on maze learning in rats. 
He used a distributed practice proce- 
dure and administered the experi- 
mental treatment one hour after each 
trial. The experimental rats showed 
similar retardation in learning when 
their acquisition curves were com- 
pared with normal control animals. 
Hayes reports that histological ex- 
amination of the brains produced no 
clear evidence of brain damage for 
any of the animals. Ransmeier and 
Gerard (1954) have also reported 
disturbances in maze learning result- 
ing from anoxia, the magnitude of 
the disturbance decreasing “along 
characteristic curves with increasing 
intervals between training and ex- 
perimental procedures.” 

Using a discrimination learning 
procedure, Thompson and Pryer 
(1956) showed that anoxia, produced 
by placing rats in a decompression 
chamber during the postlearning pe- 
tiod, could lead to decrements in 
retention analogous to those pro- 
duced by ECS. In a later study, 
Thompson (1957a) found that a 10- 
minute exposure to a simulated 
30,000-foot altitude produced deficits 
equivalent to those resulting from 
ECS, although exposure to a 20,000- 
foot altitude did not produce such 
severe effects. Finally, Thompson 
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(1957a) has also reported that when 
an ECS was given 30 seconds post- 
training, a subsequent 10-minute ex- 
posure to a simulated 30,000-foot 
altitude did not produce an addi- 
tional deficit. 


Temperature 


A number of investigators have 
studied the effects of postlearning 
temperature on retention. In most 
of the earlier work (French, 1942; 
Hunter, 1932; Jones, 1943) the aim 
was to reduce the activity of the ex- 
perimental group and thereby reduce 
retroactive inhibition. Considered in 
the light of the ECS literature, these 
studies are not immediately relevant 
to the present review because of the 
prolonged interval between the learn- 
ing trials and the achievement of the 
desired temperature change. 

In the most recent studies of Cerf 
and Otis (1957) and Ransmeier and 
Gerard (1954) it appears that tem- 
perature may have some effect on 
processes related to consolidation. 
The former investigators gave gold- 
fish 10 massed trials in an avoidance 
situation using a shifting light as the 
CS. At varying intervals following 
the trials, (0 minute, 15 minutes, 60 
minutes, or 4 hours) the body tem- 
peratures of different groups of 15 
to 19 subjects were raised briefly to a 
point sufficient to induce heat narco- 
sis (36.5°-37.0° C). In retention tests 
carried out the next day, the criterion 
of five consecutive correct responses 
in 10 trials was met by only 10.5% 
of the group narcotized immediately 
after learning, while 56.2% of the 
subjects paralyzed 4 hours following 
learning met the same criterion. The 
remaining two groups occupied inter- 
mediate positions. Fifty percent of a 
group of untreated control subjects 
also met the above criterion. Thus, 
the temperature induced narcosis 
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produced much the same effect in the 
goldfish that ECS and anoxia have 
been found to produce in rodents. 
Ransmeier and Gerard (1954) did not 
find any evidence of retroactive ef- 
fects of lowered body temperatures 
on retention of a maze habit in the 
hamster. Gerard (1955) has reported, 
however, that lowering the body 
temperature will apparently prolong 
the period during which an ECS may 
produce severe deficits. Thus, “ham- 
sters kept cool between learning and 
electroshock show as great a disrup- 
tion of learning at an interval of one 
hour as warm ones do at an interval 
of fifteen minutes.” Evidently, tèm- 
peratures sufficient to impair spon- 
taneous activity in the brain as indi- 
cated by the EEG will not act directly 
to block consolidation, although they 
may slow down the chemical pro- 
cesses involved in the fixation of the 
trace. 

Fay (1940) has reported RA in 
human subjects for events occurring 
while the patients were refrigerated, 
i.e., when the body temperature fell 
below 33.3° C. Under these circum- 
stances, the subjects could respond 
to questions and carry on a conversa- 
tion, although interrogation after 
the refrigeration procedure showed a 
loss of memory for the entire inter- 
change. Such deficits could be ex- 
plained in terms of an impairment of 
activity in those structures responsi- 
ble for the consolidation process. 
However, alternative explanations 
are also possible. 


Anesthesia 


Leukel (1957) has reported that 
sodium pentothal injected intraperi- 
toneally (IP) after each learning 
trial impaired acquisition in a maze 
in experimental rats when their time 
or error scores were compared with 
any of three control groups. Subjects 
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in the three control groups received 
either: an IP injection of water fol- 
lowing each trial, an IP injection of 
pentothal 30 minutes following each 
trial, or no injection. The scores did 
not differ among these latter groups. 
Leukel interpreted his results in 
terms of interruption of consolidation 
of the memory trace in those sub- 
jects receiving pentothal one minute 
after each trial. 

On the other hand, Russell and 
Hunter (1937) and Ransmeier and 
Gerard (1954) have not found deficits 
in retention to result from postlearn- 
ing barbiturate anesthesia. There 
are numerous differences in proce- 
dure, however, which might account 
for this discrepancy. For example, 
Russell and Hunter (1937) admin- 
istered sodium amytal subcutane- 
ously after giving their experimental 
subjects five massed trials in a maze. 
They observed no effects of the in- 
jection on subsequent retention of 
the maze. However, the subcutane- 
ous route of injection undoubtedly 
prolonged the time before the drug 
took effect (in comparison with the 
IP route used by Leukel). In addi- 
tion, the massed trials procedure 
used by Russell and Hunter resulted 
in a longer interval between learning 
and anesthesia than the Leukel pro- 
cedure of injecting one minute fol- 
lowing each trial. 

Ransmeier and Gerard (1954) and 
D. Kimura and S.E. Glickman (un- 
published) failed to find retention 
deficits as the result of anesthetizing 
hamsters or rats with ether following 
maze learning trials, or electric shock 
in an avoidance learning situation, 
respectively. These results suggest 
that the apparent effectiveness of 

barbiturates, as opposed to ether, in 
blocking consolidation may be due to 
secondary effects of the former on 
blood chemistry or blood pressure 


rather than direct synaptic interfer- 
ence. Barbiturate anesthetics pro- 
duce many more severe blood changes 
than ether including reductions in 
blood pressure and blood sugar level 
(Kohn, 1950). 

If anesthetics can be shown to 
exert reliable retroactive effects on 
learning, they may eventually prove 
useful in the localization of the neural 
structures crucial to consolidation. 
Techniques have recently been de- 
veloped which permit the delivery of 
small quantities of various drugs to 
restricted sites within the brain of a 
“behaving” animal (Fisher, 1956; 
Olds & Olds, 1958). Utilizing such 
techniques, it should be possible to 
selectively and temporarily block 
activity in various cerebral struc- 
tures during the period immediately 
following exposure to the learning 
situation and thereby determine 
which structures, if any, are crucial 
to the consolidation process. 


Brain Stimulation 


Mahut (1958), Glickman (1958), 
and Thompson (1958b) have reported 
retroactive effects of brain stimula- 
tion on learning. The stimulation was 
accomplished with chronically im- 
planted electrodes which permit the 
animal freedom of movement in the 
learning situation, but enable the 
experimenter to deliver a small elec- 
tric current to particular sites within 
the CNS at any chosen time. This 
technique enables much more specific 
delimitation of the structures in- 
volved in the presumed fixation pro- 
cess than, for example, ECS or 
anoxia. However, in the studies 
carried out thus far, there are nu- 
merous factors which serve to com- 
plicate comparisons among the stud- 
ies, as well as to rule out any simple 
“consolidation” interpretation of the 
results. 
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Mahut (1958) tested the effects of 
stimulation of the nonspecific tha- 
lamic nuclei on the performance of 
rats in a Hebb-Williams maze. Brief 
bursts of 60-cycle, sine wave, 0.25- 
volt stimulation were delivered 
through implanted electrodes while 
the rat was eating in the goal box. 
Such stimulation produced poorer 
performance in the maze, when the 
error scores of these “thalamic” rats 
are compared with those of rats re- 
ceiving either no stimulation or simi- 
lar stimulation of the midbrain teg- 
mentum. ‘The possibility exists in 
this study that the effects of stimula- 
tion were not retroactive but con- 
temporary, i.e., interfered with the 
animals’ registration of the food 
reward. This might be clarified by a 
parametric investigation of the time 
interval between learning trial and 
stimulation, following the design of 
the ECS studies (Duncan, 1949; 
Thompson & Dean, 1955). 

Glickman (1958) examined the 
effects of stimulation of the midbrain 
portion of the arousal system on the 
acquisition of an avoidance habit in 
the rat. Three 20-second bursts of 
stimulation, at considerably higher 
voltages than those used by Mahut 
(1958), were delivered immediately 
following shock to the mouth while 
the subjects were eating at a distinc- 
tive metal food spout. In retention 
tests carried out the following day, 
the animals who had received re- 
ticular stimulation after mouth-shock 
showed less avoidance of the spout 
(more eating behavior) than control 
animals not receiving brain stimula- 
tion. The interpretation of this study 
is also complicated due to the par- 
ticular characteristics of the Hudson 
(1950) one-trial learning apparatus 
which evidently lead to a portion of 
the avoidance response being learned 
in the postshock period. Hudson has 
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reported that the visual scanning 
which the animal engages in during 
the postshock period will reinforce 
the avoidance response. Thus, it is 
conceivable that the reticular stimu- 
lation could have simply interfered 
with an ongoing visual process rather 
than retroactively interfering with 
previous learning. 

Thompson (1958b), in an ingeni- 
ously designed study which permits 
him to use each animal in a variety 
of experimental conditions, has re- 
ported interference with the per- 
formance of cats in an alternation 
task as the result of intracranial 
stimulation. This effect was achieved 
with bilateral stimulation of the 
caudate nucleus following each trial 
in a modified Wisconsin General Test 
Apparatus. Similar stimulation of 
the midbrain tegmentum did not 
produce the retroactive effect, al- 
though it did interfere with perform- 
ance when the stimulation was de- 
livered either before or during a given 
trial. In this case, the interpretation 
of the retroactive disruptive effects 
of caudate stimulation is complicated 
by the possible reinforcing properties 
of this stimulation. Brady, Boren, 
Conrad, and Sidman (1957) have 
reported positively reinforcing con- 
sequences of caudate stimulation in 
the cat. It seems plausible that 
stimulation in this region, following 
a particular response, would favor 
repetition of that response and might 
act in opposition to any alternation 
habit. Such an explanation might be 
an alternative to postulating inter- 
ference with a perseveratory process. 
Since it is possible to check on the 
rewarding properties of electrical 
stimulation, using a self-stimulation 
situation such as that used by Olds 
and Milner (1954), this factor could 
be easily controlled in future studies. 
In regard to the lack of effect of teg- 
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mental stimulation, this may be ex- 
plicable in terms of the extensive 
functional localization of reinforce- 
ment pathways which appears to 
exist in that region (Glickman, 1960; 
Olds & Peretz, 1959). Olds‘ has sug- 
gested that the interference produced 
by intracranial stimulation in learn- 
ing situations may be directly related 
to the reinforcing qualities of the 
stimulation. 

There are numerous studies dem- 
onstrating interference with learning 
as a result of intracranial stimulation 
(see Zeigler, 1957, for review). How- 
ever, most of these are not directly 
interpretable in terms of retroactive 
interference because the stimulation 
is delivered during the actual per- 
formance of the task. Nevertheless, 
as Thompson (1958b) suggests, inter- 
ference with consolidation may be at 
least a partial explanation of the 
deficits observed by Rosvold and 
Delgado (1956) coincident with cau- 
date stimulation. Similarly, Burns 
and Mogenson (1958) and Burns and 
Stackhouse (1959) have reported 
deficits in the acquisition of a bar 
pressing habit in the Skinner Box 
resulting from a cortical stimulation. 
As Burns and Stackhouse note, these 
results are compatible with a per- 
severation hypothesis. 


PHYSIOLOGICAL SUBSTRATE OF 
CONSOLIDATION 


Stellar (1957) has pointed out that 
physiological data have recently ac- 
cumulated which tend to support the 
existence of a system within the brain 
responsible for the permanent fixation 
of memory traces. Milner and Pen- 
field (1955) and Scoville and Milner 
(1957) have reported cases of tem- 
poral lobe ablation in man which 
produced severe impairment of the 
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ability to acquire new material post- 
operatively, although preoperatively 
acquired material was retained. Al- 
though the crucial structures have 
not yet been definitely localized, the 
hippocampus and amygdala appear 
to be directly involved. Along similar 
lines Brady, Schreiner, Geller, and 
Kling (1954) found interfering effects 
of amygdalectomy on the acquisition 
of an avoidance response in cats, al- 
though the same lesions produced in 
cats which had already acquired the 
habit led to no disturbance in per- 
formance. The anatomical and physi- 
ological data suggest numerous path- 
ways through which these relatively 
primitive temporal lobe structures 
could exert widespread effects on the 
remainder of the brain (Adey, Mer- 
rillees, & Sunderland, 1956; Green & 
Adey, 1956). For example, the con- 
tinued action of these temporal lobe 
regions may be necessary to the 
proper regulation of firing in the non- 
specific arousal system, which in turn 
apparently exerts considerable in- 
fluence on cortical activity (Magoun, 
1958). 

The existence of structures within 
the brain which are crucial to the 
fixation of memory traces is not re- 
stricted to the vertebrate orders. 
Boycott and Young (1950) have 
identified a cerebral structure (the 
vertical lobe) requisite for fixation of 
visual memory in the octopus, and 
apparently homologous in function 
to the temporal lobe structures found 
in the higher vertebrates. Thus, 
removal of the vertical lobe drasti- 
cally impairs the ability of the ani- 
mal either to acquire a new visual 
discrimination habit (motivated by a 
combination of food and electric 
shock), or to retain such a habit for 
any length of time following training. 
The nervous system of the octopus 
differs widely from the vertebrate 


nervous system. However, the ap- 
pearance of a specialized fixation 
mechanism in both invertebrates and 
vertebrates suggests that there is 
some evolutionary utility in a dual 
process underlying memory function. 

Ata more molecular level, the most 
widespread hypothesis concerning 
the substrate of consolidation predi- 
cates its dependence on reverberatory 
circuits. This idea has its origins in 
the anatomical demonstrations of 
Lorente de No (1938) and has been 
subscribed to in varying forms by 
Hebb (1949), Young (1953), and 
Gerard (1955). The basic supposi- 
tion is that reverberatory activity 
maintains the memory until the 
permanent changes underlying fixa- 
tion of the trace have been completed. 
This dual process hypothesis of mem- 
ory fixation has the advantage of 
explaining why interference with 
neural activity immediately after 
“learning” blocks retention while 
similar procedures instituted at a 
later time do not. One group of 
studies which may be directly rele- 
vant to the reverberatory circuit hy- 
pothesis of consolidation has been 
carried out by B. D. Burns and his 
co-workers (Burns, 1954, 1958). 
Burns has developed a technique 
which allows the isolation of small 
areas of cortex from the remainder of 
the brain, while leaving the blood 
supply to the area relatively unaf- 
fected. He has extensively studied 
the electrical activity of these iso- 
lated slabs in response to direct 
electrical stimulation. Interestingly 
enough, he has found: that a single 
train of pulses can initiate bursts of 
= activity in one of these preparations 
lasting for 30 minutes or more; that 
such bursts of activity can be blocked 
by a subsequently applied electrical 
Stimulus; that such activity becomes 
easier to evoke with repeated appli- 
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cations of the stimulus; and that the 
burst activity is apparently due, in 
part to reverberatory activity among 
groups of neurons, and in part to 
differential rates of depolarization 
within various segments of individual 
neurons. These first three observa- 
tions certainly coincide with what 
one would expect if such a process 
underlay consolidation. However, 
it is necessary to be cautious in gen- 
eralizing from the type of activity 
observed in these special preparations 
to that occurring in the intact brain. 
Burns (1958) himself has rejected 
these preparations as a general model 
for memory on the grounds that such 
circuits would be too susceptible to 
external interference. However, this 
is one aspect of the data which makes 
Burns’ findings so attractive as a 
model of the first phase of a dual pro- 
cess theory of memory, susceptibility 
of learned material to interference 
providing the main behavioral evi- 
dence for the existence of a consolida- 
tion process. 

Finally, moving to a still more 
molecular analysis of the problem, it 
is reasonable to inquire about the 
specific changes which might be pro- 
duced by some sort of perseverative 
process. Nearly all investigators 
have at this level proposed some sort 
of growth process or chemical change 
at the synapse. In this respect, our 
ideas have changed little from those 
of 1929 when Lashley wrote: 

We have today an almost universal acceptance 
of the theory that learning consists of modifi- 


cation of the resistance of specific synapses 
within definite conduction units of the ner- 


vous system. 
After expressing numerous reserva- 
tions about the adequacy of this 
assumption, Lashley concluded by 
noting that: 


The synapse is, physiologically, a convention 
to describe the polarity of conduction in the 
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nervous system of higher animals, together 
with some similarities of function in the cen- 
tral nervous system and neuromuscular junc- 
tion. That these functions are due to the ac- 
tion of the intercellular membranes has not 
been directly demonstrated (p. 127). 


Here again, recent neurophysio- 
logical progress tempers Lashley's 
skepticism. The synapse is no longer 
a “convention” but a point-at-able 
structure which can be photographed 
and studied with the electron micro- 
scope (Palay, 1956). Further, as 
Lloyd (1949) and Eccles (1953) have 
shown, the rapid firing of impulses 
across synaptic junctions can result in 
increased excitability of these syn- 
apses for periods lasting from minutes 
to hours. There is general agreement 
that this increased excitability re- 
sults from the firing of presynaptic 
fibers, although it is not yet clear 
whether this is in turn due to an ac- 
tual change in the dimensions of the 
synaptic knobs as suggested by Ec- 
cles (1953, 1957) or if an alternate 
explanation, e.g., Lloyd (1949), may 
suffice. Eccles (1953) has proposed 
this phenomenon of posttetanic po- 
tentiation as a general model for 
conditioning and memory. Such a 
proposal meets with many difficulties 
(Malmo, 1954). However, there is no 
question that a person ascribing 
learning to changes in synaptic ex- 
citability could do so with more con- 
fidence today than was possible 30 
years ago. 


CONCLUSIONS 


In the opinion of the writer, the 
over-all weight of evidence certainly 
favors the existence of some mecha- 
nism of consolidation (in spite of the 
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fact that alternative explanations 
are possible for many of the experi- 
ments which supposedly support the 
existence of such a process). Further- 
more, the application of available 
physiological procedures appears to 
offer a promising approach to defin- 
ing the structures involved in the 
fixation of memory traces. The most 
severe problems presented thus far 
have occurred as the result of con- 
founds in the behavioral test situa- 
tions employed, rather than through 
some defect in the modes of physio- 
logical interference. These problems 
are not insoluble, however, and an 
attempt was made to indicate this in 
the text of the paper. 

As a final point, the material re- 
viewed suggests the possibility that 
pseudoneurological speculation, re- 
sulting from strictly behavioral ob- 
servation, can result in productive 
physiological research—when the 
speculation is shrewdly conceived. 
Moreover, the physiologist would ap- 
pear to have already begun to repay 
this debt by suggesting purely be- 
havioral studies or new interpreta- 
tions of behavioral data. The studies 
demonstrating interfering effects of 
visual stimulation interpolated im- 
mediately after visual discrimination 
learning (Thompson, 1957b; Thomp- 
son & Bryant, 1955) are examples of 
such physiologically influenced ‘‘be- 
havioral”’ investigations, Along simi- 
lar lines, Walker’s (1958) reinterpre- 
tation of reaction decrement, spon- 
taneous alternation data, in terms of 
mechanisms serving to protect con- 
solidation, appears to be equally 
sensitive to current physiological 
research. 
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The term anxiety has enjoyed 
great popularity in the writings and 
researches of psychologists in the last 
decade, and procedures for measuring 
this hypothetical state have pro- 
liferated wildly. There is every indi- 
cation that psychologists will con- 
tinue to develop and employ measures 
of anxiety in many areas of research, 
especially in the rapidly expanding 
number of studies of psychothera- 
peutic process and change, in the 
already booming area of psychophar- 
macology, in studying the effects of 
anxiety on performance, and in at- 
tempts to assess such constructs as 
aggression anxiety or sex anxiety. It 
is the purpose of this paper to first 
impose some restrictions upon the 
definition of anxiety, and then to 
focus upon the problem of assess- 
ment by physiological-behavioral 
measures. No attempt will be made 
in this paper to review the research 
and evaluate the problems associated 
with assessing anxiety by self-report 
techniques. 

One's theoretical approach to anxi- 
ety affects how one goes about meas- 
uring it; likewise results of attempts 
to assess anxiety should eventually 
modify and help refine the theoretical 
conception of anxiety. Thus the 
initial comments about the nature of 
anxiety should be considered as a 
rough formulation only, with both 
assessment procedures and theory 
modifying each other as investigation 

proceeds. It is recognized that this 
formulation, rough as it is, cannot 

1 This paper was prepared in part while the 
author held a visiting appointment at the 
University of North Carolina. The author 

wishes to thank Earl Baughman and Leonard 
Berkowitz for their helpful suggestions. 


234 


include all that anxiety means to all 
people, and that accordingly to make 
this review manageable, it is neces- 
sary to delimit the concept. 


A CONCEPTION OF ANXIETY 


As a starting point it is proposed R i 
that the construct of anxiety be con- 


sidered similar and perhaps identical 
to the reaction of fear, the neuro- 
physiological bases for which are not 
completely known but would seem to 
especially involve the functions of 
the posterior hypothalamus and its 
effects upon the sympathetic nervous 
system, the adrenal medulla, and the 
pituitary-adrenocortical system. The 
brain stem reticular formation may 
also play a part in this reaction. It is 
recognized that this is undoubtedly 
an oversimplification of the complex 
and interacting neurophysiological 
mechanisms involved in fear. This 
reaction may be largely innate yet it 
is likely that as a result of learning or 
constitutional predisposition individ- 
uals tend to have variations in the 
manner in which the anxiety reaction 
is expressed. 

It is further proposed that anxiety 
represents only one of many arousal 
states that can be differentiated from 
a more general state of activation as 
arousal becomes more intense. Thus 
the arousal that occurs when a person 
passes from a sleeping or very relaxed 
state toa waking, behaving state may 
be of a fairly generalized sort with no 
specialized affective or motivational 
reactions involved. However, as 
arousal becomes more intense, differ- 


entiation probably occurs and dis- 
tinctive arousal states may emerge 


relating to such constructs as anxiety, 
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anger, hunger, sex, or other emotional 
or motivational states. Although it is 
possible that research will suggest 
the value of distinguishing anxiety 
from fear at the response level, or one 
kind of anxiety from another, it is 
perhaps best to demonstrate the 
utility of one construct of anxiety 
and its distinctiveness from other 
arousal states before adding unneces- 
sarily to the number of theoretical 
constructs extant. 

Anxiety also possesses the property 
of being highly learnable: that is, the 
hypothetical response becomes read- 
ily conditioned to stimuli that do not 
innately elicit the response. This 
characteristic renders difficult if not 
impossible any attempt to define 
anxiety on the basis of stimuli that 
elicit it, since the stimuli that elicit it 
will vary widely from person to per- 
son. An exception would be direct 
electrical stimulation of the brain 
(Miller, 1958), where the effective 
antecedent stimulus might be well 
defined. 

As a consequence of the difficulty 
of approaching the construct of anxi- 
ety from the stimulus side in human 
subjects, the primary emphasis in 
this paper will be to review research 
relevant to the assessment of anxiety 
in terms of response patterns. The 
observable responses from which one 
might infer the strength of the anxi- 
ety reaction are of two basic types: 
physiological-behavioral responses 
and self-report responses. As previ- 
ously mentioned, this paper will be 
primarily concerned with the first 
type of response. 

In addition to the hypothetical 
anxiety state and its observable 
manifestations there are two other 
variables intimately related to anxi- 
ety which are kept conceptually 
distinct in the present view: namely, 
those stimuli (external or internal) 
which elicit the anxiety response, and 
those responses which have been 


learned because they reduce or avoid 
the anxiety response. From the 
point of view of measurement the 
stimuli that evoke anxiety become 
important only if one wants to know 
what situations or thoughts or feel- 
ings elicit anxiety. Thus the common 
distinction between anxiety and fear 
in terms of the latter being in response 
to a realistic danger and the former 
being a response to unrealistic or un- 
known threats is basically a stimulus 
defined difference and does not neces- 
sarily involve a difference in response. 

There exists a possible source of 
confusion with respect to the re- 
sponses that have been learned to 
reduce anxiety in that clinicians 
frequently infer anxiety on the basis 
of these “defenses” against anxiety 
as much as from direct expression o! 
the anxiety itself. Again, from the 
point of view of theory as well as 
measurement it is preferable to keep 
these two variables distinct if pos- 
sible. In fact, it would seem likely 
that when a person is making a suc- 
cessful “defensive” response, no anxi- 
ety is present. To the extent that 
this is so it would be misleading to 
infer the strength of the momentary 
anxiety level from the presence of 
learned anxiety reducing responses. 


THE MEASUREMENT OF ANXIETY 


The foregoing theoretical analysis 
suggests that in spite of individual 
variations in response there might 
still be some pattern of physiological- 
behavioral responses associated with 
anxiety arousal that would be distinct 
from other patterns of response asso- 
ciated with other emotional or arousal 
states. Findings based primarily on 
physiological response patterns will 
be considered first followed by find- 
ings based primarily on behavioral 
response patterns. Two basic ques- 
tions will be asked with respect to 
both the physiological and the be- 
havioral evidence: (a) Does a dis- 
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tinctive pattern of responses emerge, 
tentatively identifiable as reflecting 
anxiety, that can be distinguished 
from other patterns associated with 
other arousal states, when the differ- 
ing arousal states have been experi- 
mentally induced? (6) What is the 
nature of the intercorrelations among 
physiological or behavioral measures 
which have been obtained under the 
same experimental conditions, and is 
there any evidence of a distinguish- 
able cluster of intercorrelated vari- 
ables that might be tentatively 
identified as reflecting anxiety? The 
studies do not always lend them- 
selves to a clear-cut analysis in these 
terms but these are the guiding ques- 
tions being considered. 


Physiological Measures: Experimental 
Comparisons 


The studies of primary interest 
here are those in which an attempt 
was made to distinguish between two 
or more experimentally induced 
arousal states where one of these was 
considered to represent a fear or anxi- 
ety reaction. There are three studies 
that most closely follow this para- 
digm. Ax (1953) reports a study in 
which a variety of physiological 
measures were obtained from normals 
under conditions presented in coun- 
terbalanced order that were designed 
to elicit fear and anger, respectively. 
The fear condition was ingeniously 
contrived to make the subject think 
that the apparatus was faulty and 
that he was in real danger of receiv- 
ing a severe, perhaps even fatal, 
electric shock. Anger was aroused by 
an obnoxious assistant who generally 
insulted and belittled the subject. 
Schachter (1957) repeated Ax’ study 
using hypertensive, potential hyper- 
tensive, and normotensive subjects, 
and added a pain experience (cold 
pressor test) to the fear and anger 
situations. All subjects received the 
treatments in the same order: pain, 
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fear, and anger. Lewinsohn (1956) 
obtained three physiological measures 
plus a measure of finger tremor on 
groups of normals, anxiety reaction 
patients, ulcer patients, and hyper- 
tensive patients subjected in coun- 
terbalanced order to the cold pressor 
test and a failure experience accom- 
panied by criticism and electric 
shock. Another study that is highly 
relevant to the issue but which em- 
ployed a somewhat different research 
strategy is that of Funkenstein, King, 
and Drolette (1957). After stressing 
their college student subjects they 
determined in a poststress interview 
whether a subject had tended to 
experience anger outwardly directed, 
anger inwardly directed, or anxiety. 
The scores obtained were limited to 
blood pressure and ballistocardio- 
graphic measures. 

The results of these four studies 
are summarized in Table 1. Most 
scores in the Ax and Schachter stud- 
ies represent difference scores be- 
tween prestress resting level and the 
highest (or in some cases the lowest) 
level reached during stress. The 
scores in the Lewinsohn study repre- 
sent differences in the mean during 
rest and the mean during stress, with 
the exception of the GSR score which 
represents the largest deflection dur- 
ing stress. All scores reported from 
the Funkenstein study are percent- 
age changes from prestress levels. 

In spite of some inconsistencies 
among the studies there does appear 
to be evidence for distinguishable 
response patterns that can be tenta- 
tively associated with the constructs 
of fear (anxiety) and anger. Diastolic 
blood pressure increased more for 
anger than fear in all three studies in 
which fear and anger states were 
thought to be aroused (significantly 
different from chance in two studies). 
Heart rate increased more in fear 
than anger in all three studies (signifi- 
cant in two). Maximum heart rate 
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TABLE 1 
COMPARISON OF PHYSIOLOGICAL MEASURES ASSOCIATED WITH 


Ax Schachter 


DIFFERENT EMOTIONAL AROUSAL STATES IN Four Stupres 


Lewinsohn Funkenstein 


Measure ae —— 

ES Fear Anger | Fear Anger Pain Fear Pain Fear Anger-out 
Systolic blood pressure 20.4 19.2 22.5 21.1 17.8 —* = 
Diastolic blood pressure | 14.5% 17-86] 18.7, 14-8, 11-8 ‘9.95 22.3% 

eart rate (+ 0.3 25.8 18.7* 10.8* 0.3* Py 9° be ame 
Heart rate (24 4.0" “6.08 See re 
Cardiac output 6.7* 3.0% -0.25° 61.9% —3.2%° 
Peripheral resistance -1.10% 0.049 1.28 -19.39 32.9%" 
Hand temperature (—) -045 -050| .036* -030* .024* 
Palmar conductance 14.8* 9.4% |—1,998* —2,18°* —2.33%4 

ees deflection in stress, 
No. GSRs 4.7% 11.6% ge 
Respiratory rate 6.0* 2.3% | 2.8* 2.1% 0.7* 
Frontalis muscle tension 3.34% 4.35%) 1.30 2.26 1.65 
No. muscle potential peaks | 13.2% 10.5* 
Finger tremor 87 118 
Salivary output —0.9 7.9 


“Schachter used the transformation, log 1/(Ri—Rs), where Ri initial resistance and R:=lowest resistance 
during stress. The smallest negative number, —1.99, for fear accordingly refers to the largest decrease in resistance. 
Significant at the .05 level; for Schachter this is based on an overall analysis of variance for the three conditions, 


decrease was significantly greater in 
anger than fear in the one study in 
which it was reported. Cardiac out- 
put increased significantly more in 
fear than anger in the two studies 
in which it was reported, and periph- 
eral resistance decreased significantly 
more in fear than anger in both stud- 
ies where it wasreported. Palmar con- 
ductance increased significantly more 
in fear than anger in the two studies 
Where it was reported. Number of 
discrete GSRs, however, was signifi- 
cantly higher in anger than fear in the 
One study where this was measured. 
Respiration rate increased signifi- 
cantly more in fear than anger in the 
two studies reporting this measure. 
Frontalis muscle tension increased 
more in fear than anger in the two 
studies measuring it (significant in 
one), 

Another study that was not in- 
cluded in the tabular presentation 
Provides additional support for the 
different heart rate responses associ- 
ated with anxiety and anger. Di- 
Mascio, Boyd, and Greenblatt (1957) 
Studied one psychotherapy patient 
Over 11 interviews and found a cor- 
relation (rho) of .69 between average 
heart rate and amount of rated ten- 
Ston (anxiety?) in the interviews, and 


a correlation of —.37 between aver- 
age heart rate and amount of rated 
antagonism in the interviews. 

The two studies in Table 1 involv- 
ing a painful experience, the cold 
pressor test, suggest that this arousal 
state may also be distinguishable 
from fear, although the differentia- 
tion of pain and anger is less clear. 
It is, of course, not possible to know 
from these results how specific these 
reactions might be to the cold pressor 
test as opposed to pain stimulation 
generally. 

Funkenstein et al. (1957) propose a 
theory that may serve to provide 
some integration for these various 
findings. They suggest that the 
physiological reaction accompanying 
anger-out is a norepinephrine-like 
reaction and that accompanying anxi- 
ety is an epinephrine-like reaction. 
The physiological reactions accom- 
panying injections of epinephrine 
and norepinephrine have been in- 
vestigated by Goldenberg, Pines, 
Baldwin, Greene, and Roh (1948), 
Barcroft and Konzett (1949), De- 
Largy, Greenfield, McCorry, and 
Whelan (1950), Goldenberg (1951), 
Swan (1952), and Clemens (1957). 
In general it is found that epineph- 
rine leads to increased palmar con- 
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ductance, systolic blood pressure, 
heart rate, cardiac output, forehead 
temperature, central nervous system 
stimulation, blood sugar level; and 
decreased diastolic blood pressure, 
peripheral resistance, hand tempera- 
ture, and salivary output. Norepi- 
nephrine leads to increased systolic 
and diastolic blood pressure and pe- 
ripheral resistance, no change or a 
slight decrease in heart rate and 
cardiac output, and only slight in- 
creases in central nervous system 
stimulation and blood sugar level. 

It is generally thought that reac- 
tions associated with norepinephrine 
are more limited, possibly restricted 
to peripheral vasoconstriction result- 
ing from secretion at the sympathetic 
nerve endings, than are the reactions 
to epinephrine. However, no studies 
were found in which the effects of 
injected norepinephrine upon a wide 
range of responses including palmar 
conductance, hand or finger tempera- 
ture, respiration rate, salivary out- 
put, or muscle potentials were as- 
sessed. In terms of the measures that 
have been obtained under both kinds 
of hormonal injections (Barcroft & 
Konzett, 1949; DeLargy et al., 1950; 
Goldenberg et al., 1948), heart rate, 
diastolic blood pressure, cardiac out- 
put, and peripheral resistance ap- 
pear to be the most discriminating. 
Neither cardiac output nor periph- 
eral resistance is readily obtainable 
by direct measurement. Cardiac out- 
put is usually inferred from ballisto- 
cardiographic measures, and periph- 
eral resistance is usually estimated by 
dividing mean arterial blood pressure 
by cardiac output. 

Funkenstein et al. (1957) divided 
their subjects into subgroups on the 
basis of _epinephrine-like, norepi- 
nephrine-like, and indeterminate re- 
actions and found a highly signifi- 
cant relationship in the expected 
direction between these physiological 
reaction types and the tendency to 


respond by anger-out as opposed to 
anxiety. Schachter (1957) making 
use of a greater variety of physiologi- 
cal measures likewise computed an 
index of epinephrine- and norepi- 
nephrine-like reactions and found 
these indices to vary significantly as 
a function of the pain, anger, and 
fear conditions with pain showing the 
most norepinephrine-like reaction 
and fear the most epinephrine-like re- 
action with anger falling in between. 

Although it would be premature to 
conceptualize the anxiety reaction as 
being entirely defined by the results 
of epinephrine secretion, the distinc- 
tion between the epinephrine- and 
norepinephrine-like reactions may 
well be an important one for anxiety 
measurement. The secretion of 
epinephrine and norepinephrine from 
the adrenal medulla and the release 
of norepinephrine at the sympathetic 
nerve endings are all affected by 
sympathetic nervous system stimula- 
tion. The fact that these two hor- 
mones produce quite different reac- 
tions points up what has long been 
known: namely, that it is a great 
oversimplification to speak of sym- 
pathetic arousal as if it were a uni- 
tary function. Although the response 
pattern associated with experimen- 
tally induced anxiety conforms rather 
closely to the response pattern asso- 
ciated with epinephrine injection, 
the response pattern associated with 
anger is not as closely related to the 
responses produced by norepineph- 
rine injection. Perhaps the distinc- 
tion between anxiety and anger, at 
the humoral level, is one involving 
the relation of epinephrine to norepi- 
nephrine in which anxiety is associ- 
ated with a purer epinephrine-like 
reaction and anger with a mixed 
pattern of epinephrine and norepi- 
nephrine responses. 

There are other studies where one 
or two physiological measures have 
been obtained under conditions likely 


t 


ASSESSMENT OF ANXIETY 


to arouse anxiety. For example, 
Hickham, Cargill, and Golden (1948) 
found heart rate and cardiac output 
to increase substantially in medical 
students before what was considered 
to be an anxiety arousing situation, 
an oral examination, as compared to 
more relaxed conditions a month 
later. Likewise, Malmo, Boag, and 
Smith (1957) report increased heart 
rate in neurotic subjects after criti- 
cism as compared with decreased 
heart rate after praise. Although 
studies of this kind tend to be con- 
sistent with the previously described 
studies, they do not shed additional 
light on the question of whether some 
pattern of response related to anxiety 
can be differentiated from patterns 
of response associated with other 
kinds of arousal states. 

Davis (1957), Davis and Buchwald 
(1957), and Davis, Buchwald, and 
Frankman (1955) also report evi- 
dence that different stimuli elicit 
distinctive autonomic response pat- 
terns. There is no reason to believe, 
however, that any of their stimuli, 
for example, pictures of nudes, land- 
scapes, etc., were likely to evoke 
anxiety in many of their subjects. 
These studies do point to the possible 
subtleties in autonomic patterns as- 
sociated with various kinds of stimu- 
lation or arousal states, and caution 
against any too ready acceptance of 
some particular pattern as being the 
anxiety or the anger pattern. All of 
the studies described thus far, though, 
are consistent with the possibility 
that some pattern of physiological 
= measures may allow one to infer the 
magnitude of the hypothetical anxi- 
ety reaction differentially from other 
hypothetical states such as anger or 
Pain. 

Physiological Measures: Group Com- 
parisons 


There is a host of studies in which 
Physiological measures are con- 


trasted between normals and various 
clinical groups presumed to be in gen- 
eral more anxious than the normals, 
The studies that will be considered 
here are those involving patient 
groups in which the presence of mani- 
fest anxiety was reported to be a 
prominent part of the symptom pic- 
ture; accordingly, much of the physi- 
ological research on such psychoso- 
matic disorders as hypertension and 
peptic ulcer will not be summar- 
ized. 

Sherman and Jost (1942) found 15 
neurotic children to have lower rest- 
ing level palmar conductance than 18 
well adjusted children, but more 
resting level hand tremors, lower per- 
centage of alpha rhythm in the EEG, 
and faster respiration rate than well 
adjusted children. No differences 
were found for heart rate or blood 
pressure. Although measures were 
taken in a series of seven conditions, 
the results described above appeared 
to represent differences in general 
level rather than different degrees of 
reaction to the various conditions. 
Jurko, Jost, and Hill (1952) obtained 
measures on 25 normals, 20 neurotics, 
and 10 schizophrenics (all adults) 
while administering the Rosenzweig 
P-F test, and found heart rate, 
respiration rate, and respiration vari- 
ability higher in patient than normal 
groups before and during the test 
administration. A body movement 
score was highest for the schizo- 
phrenics and lowest for the normals. 
Palmar conductance was again found 
to be inconsistent with the general 
pattern, being highest for the nor- 
mals and lowest for the schizophrenics 
before and during test administra- 
tion. In neither of these two studies, 
however, was any attempt made to 
restrict the sample of neurotics to 
patients in which anxiety was the 
most prominent symptom. 

GSR conditioning rate, on the 
other hand, has been found to be 
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faster in more anxious subjects 
(Bitterman & Holtzman, 1952; Schiff, 
Dougan, & Welch, 1949; Welch & 
Kubis, 1947). 

White and Gildea (1937) found 
that patients in which anxiety was a 
prominent symptom showed greater 
heart rate increases to the cold pressor 
test than did normals. On the surface 
such a finding appears contradictory 
to the results of Schachter (1957) in 
which the physiological responses 
associated with the cold pressor test 
were clearly distinguishable from 
those associated with anxiety. White 
and Gildea, however, obtained meas- 
ures during a rest period, during a 
brief anticipation period in which the 
experimenter moved the dish of ice 
water close to the subject, and during 
the immersion itself. For the normal 
group the average heart rates for 
these three periods were 75.7, 81.5, 
and 80.0, respectively; and for a 
group of anxiety neurotics 81.0, 90.0, 
and 95.5, respectively. Clearly, it 
was the anticipation of the experience 
that led to increased heart rate for 
the normals, not the pain experience 
itself. The anxious patients likewise 
showed their greatest increase during 
anticipation. These results suggest 
that anticipation of the cold pressor 
test is anxiety arousing, and might 
yield a different pattern of response, 
in normals at any rate, than the pain 
experience itself. 

The above results of White and 
Gildea as well as the results of 
Schachter (1957) and  Lewinsohn 
(1956) argue against the theoretical 
formulation of Mowrer (1939) that 
anxiety (fear) is the conditioned 
form of the pain reaction. 

In the Lewinsohn (1956) study 
previously mentioned, resting level 
palmar conductance was highest for 
the anxiety reaction group and lowest 
for the ulcer group, with normals and 
hypertensives falling in between. 
Resting level salivary output was 
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- ferent stresses: Malmo, Shagass, and 


highest in the ulcer group, and i 
and about the same for the other 
three groups. Somewhat surprisingly, 
resting level heart rate was lowest for 
the anxiety group. The change scores 
showed no particular tendency to be 
associated with the diagnostic groups. 
Wishner (1953) found resting level 
heart rate to be higher in 11 anxiety 
neurotics than in 10 normals and & 
tendency, not significant, for respira- 
tion rate to be faster in the neurotics. 
Funkenstein, Greenblatt, and Solo- ` 
mon (1951, 1952) conclude that pa- ~ 
tients with anxiety and depressive 
symptoms are manifesting a chronic 
epinephrine-like reaction, whereas 
patients with paranoid tendencies or 
who are otherwise directing their 
anger and blame upon the external 
world are manifesting a chronic 
norepinephrine-like reaction. Their 
conclusions are based primarily on 
the patients’ reactions to the mech- 
olyl test (Funkenstein, Greenblatt, 
& Solomon, 1950). 

Malmo (1950, 1957) has summar- 
ized his research with respect to 
physiological measures found to dis- 
criminate between normals and pa- 
tients with pathological degrees of 
anxiety. In his 1957 article he con- 
cludes that anxious patients show 
greater reactivity in many measu 
regardless of the kind of stress u 
Thus, Malmo and Shagass (1949a) 
using a painful thermal stimulation 
of the forehead as their stress found 
anxiety neurotics and early schizo- 
phrenics to show more finger move- 
ments, greater neck muscle poten- 
tials, more head movements, more 
respiratory irregularities, and greater 
heart rate variability than normal 
controls. Percent change of the GSR 
showed no significant relationship. 
These results have been generally 
borne out in other studies using dif- 


Davis (1951); Malmo, Shagass, Be- 
langer, and Smith (1951). The results. k 
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of Malmo and Smith (1955) suggest 
frontalis muscle tension may be a 
more sensitive discriminator between 
normals and anxiety neurotics than 
forearm muscle tension. 

Wenger (1948) using considerably 
larger Ns than most investigators 
compared resting state physiological 
measures of 225 patients with the 
diagnosis of operational fatigue, 98 
hospitalized psychoneurotics, and a 
normative group of 488 unselected 
preflight students in the Army Air 
Force. The 10 measures that signifi- 
cantly discriminated between the 
operational fatigue group and the 
normal group were salivary output, 
palmar conductance, systolic and 
diastolic blood pressures, sinus ar- 
rhythmia, heart period, sublingual 
temperature, finger temperature, res- 
piration period, and tidal air mean. 
The operational fatigue group showed 
sympathetic dominance on all of the 
above measures except sublingual 
temperature. For 47 patients in the 
operational fatigue group Wenger 
obtained repeat measures on most 
variables at a later time when they 
were considered improved and ready 
to return to duty. Of the 20 variables 
tested only palmar conductance, 
heart period, and finger temperature 
showed significant changes, and these 
were all in the direction of lessened 
Sympathetic arousal. The results 
with respect to the hospitalized psy- 
choneurotics, although not yielding 
exact correspondence on specific 
measures, also showed a strong sym- 
pathetic dominance for this clinical 
group. 

Gunderson (1953) obtained 12 
Testing state autonomic measures, 
Selected on the basis of Wenger's 
Previous work, on a sample of 110 
early schizophrenics with an average 
length of hospitalization of about 2 
years. Nine measures—salivary out- 
Put, dermographic latency, dermo- 
8taphic persistence, systolic blood 


pressure, diastolic blood pressure, 
finger temperature, heart rate, respi- 
ration rate, and sublingual tempera- 
ture—were significantly different 
from Wenger's normative group of 
aviation cadets, and with the excep- 
tion of sublingual temperature all 
were in the direction of greater sym- 
pathetic arousal. Palmar conduct- 
ance failed to discriminate and was, 
in fact, almost identical for the two 
groups. This schizophrenic sample 
also showed significantly greater sym- 
pathetic arousal in seven of these 
measures than Wenger's neurotic 
group. As Gunderson points out this 
indication of greater anxiety in the 
schizophrenic group may well not 
exist in more chronic patients. Gun- 
derson also divided the schizophrenic 
subjects into those that had improved 
the most and least with shock ther- 
apy and found the most improved 
group to show less general sympa- 
thetic arousal as measured by 
Wenger’s autonomic balance score, 
the conclusion being that improve- 
ment had been accompanied by a 
decreased arousal. 

There are difficulties involved in 
comparing these studies in which 
anxiety is assumed to be present by 
virtue of a psychiatric diagnosis with 
those in which anxiety was produced 
experimentally. For example, if 
anger or annoyance does involve a 
distinctive arousal state and if such a 
state is present more often in some of 
these patient groups than in normals, 
a not unlikely assumption, then the 
pattern of mean scores may reflect a 
mixture of anxiety and anger as well 
as other arousal states. Nevertheless, 
many measures which belong to the 
epinephrine-like pattern of reaction 
are found to consistently discrimi- 
nate, with an occasional exception, 
between anxious patients and nor- 
mals. By and large it would appear 
that so-called resting state measures 
discriminate between the patients 
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and normals as well, and in some 
cases better, than do change scores 
associated with experimental stress. 
Some of the studies reporting change 
score results may be misleading since 
in most cases the patient groups 
start out with higher initial level 
scores. The high negative correlation 
between initial level and the magni- 
tude of the change score that prevails 
for most autonomic measures might 
well obscure some real differences 
that would have emerged if this cor- 
relation had been partialed out by 
some procedure such as Lacey’s 
(1956) autonomic lability score. 

It is also possible that the particu- 
lar pattern of autonomic responses 
associated with an immediate threat 
situation is different from the “steady 
state” pattern of more chronically 
elevated responses found in many 
psychiatric patients. It is interesting 
in this regard that Wenger (1957) 
in recent pattern analyses of the data 
in his various samples reports not 
only patterns of sympathetic and 
parasympathetic dominance but a 
pattern composed of a mixture of 
sympathetic and parasympathetic 
type of responses. This latter pat- 
tern, which Wenger calls the B pat- 
tern, consists of three sympathetic 
type tendencies, high heart rate, 
high systolic blood pressure, and low 
salivary output; and two character- 
istics of parasympathetic innervation 
or lack of sympathetic arousal, high 
finger temperature and low palmar 
conductance. The sympathetic pat- 
tern occurred more frequently in 
neurotic and schizophrenic samples 
than in the normal group, but not 
more frequently in the operational 
fatigue or a psychosomatic sample 
than in the normal group. The B 
pattern occurred more frequently in 
all of the four psychiatric groups than 
in the normal group. Perhaps this B 
pattern represents a more chronic 
result of psychological stress which 


could be distinguished from the anxi- 
ety state as presently conceived. 
Such an interpretation is consistent 
with the common clinical view that 
psychosomatic symptoms frequently 
serve an anxiety reducing function. 
Itis also noteworthy that the findings 
of Sherman and Jost (1942) and 
Jurko, Jost, and Hill (1952) of low 
resting level palmar conductance ina 
pattern otherwise suggestive of sym- 
pathetic activation in neurotic pa- 
tients is consistent with the existence 
of Wenger’s B pattern. 

To carry speculation a bit further 
in this area, it may be that there are 
systematic differences in response 
patterns as a function of the chronic- 
ity of the stress, as suggested by 
Selye (1950). Thus the pattern(s) of 
immediate change scores associated 
with discrete stimuli (electric shock 
or a threatening word) may be differ- 
ent from the pattern(s) of response 
associated with a stress of longer dura- 
tion but still essentially temporary 
or situational (oral examination, the 
general situation in an electric shock 
experiment, or an appointment for a 
first psychotherapy hour), where the 
change scores would have to be based 
upon measures obtained at some 
more relaxed time. And both of the 
above kinds of patterns might differ 
from patterns of response resulting 
from stress continuing over months or 
years as would be the case with psy- 
chiatric patients. The distinctive 
characteristics of responses associated 
with the second as opposed to the 
first type of stress may result from 
humoral effects being added to the 
more direct and shorter latency 
effects of autonomic nervous system 
stimulation. 

There have been several other ap- 
proaches to the physiological assess- 
ment of anxiety employing measures 
less readily obtainable and also less 
amenable to continuous recording 
than most of the ones considered 


above. Ulett, Gleser, Winokur, and 
awler (1953) and Shagass (1955b) 
report that the EEG of anxious pa- 
tients can be more readily “driven” 
© at higher frequencies than is the case 
` for normals or less anxious patients. 
There was no tendency for the aver- 
age undriven alpha frequency to be 
different for the groups. Shagass 
(1955a) further reports that changes 
in the driven EEG frequency corre- 
spond to changes in anxiety level for 
the same person measured at different 
times. 

Sedation threshold is also reported 
by Shagass (1954) and Shagass and 
Naiman (1955) to be related to anxi- 
ety level in patients. Basowitz, 
Persky, Korchin, and Grinker (1955) 
find more hippuric acid in the urine 
of paratrooper trainees assessed to be 
anxious than those not anxious, and 
also more in anxiety neurotics than in 
normals. 


Physiological Measures: Intercorrela- 
tions 


On the basis of the research just 
summarized one might assume that 
many of the measures found to be re- 
lated to experimentally induced or 
Clinically assessed anxiety would 
show substantial intercorrelations. 
Research thus far gives little ground 
for optimism that these variables will 
correlate very highly, if at all. How- 
ever, it should be pointed out that 
there are few researches that provide 
much direct evidence on the ques- 
tion: namely, correlations among 
changes in measures obtained under 
~ resting and a clearly fear or anxiety 

arousing situation. Ax (1953) inter- 
correlated the seven physiological 
hange scores that significantly dis- 
= Criminated between the fear and 
nger conditions. The intercorrela- 
ms of these scores under the anger 
dition tended to be higher than 
the fear condition. The correla- 
s were for the most part insignifi- 
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cant for fear. Schachter (1957) did 
not report intercorrelations among 
his measures but did find significantly 
more variability among the measures 
under fear than anger. Lewinsohn 
(1956) likewise reported intercorrela- 
tions among his four variables for 
base level scores, for change scores to 
the cold pressor test, and for change 
scores to the failure-criticism condi- 
tion. Only a few correlations were 
significant, probably not more than 
could have occurred by chance. Terry 
(1953) intercorrelated a number of 
physiological change scores associ- 
ated with doing arithmetic problems 
under distracting noise conditions. 
The intercorrelations between differ- 
ent autonomic systems were very 
low and for the most part insignifi- 
cant. Only measures of closely re- 
lated functions, such as systolic and 
diastolic blood pressure, correlated to 
any degree. It is possible that the 
stress condition was not particularly 
anxiety arousing for most subjects. 

Sherman and Jost (1942) in con- 
trast to the above studies did find a 
number of significant correlations 
among their physiological variables 
for neurotic and normal children 
combined. Although their correlation 
matrix is based on a mixture of abso- 
lute level scores, percent change 
scores, and scores obtained at differ- 
ent points in a sequence of seven 
conditions, there does seem to be a 
cluster of fairly highly intercorre- 
lated variables suggesting some 
arousal dimension. The measures 
most highly intercorrelated are hand 
tremor, percent heart rate change, 
percent alpha dominance (negative 
correlations), and respiratory vari- 
ability. Weybrew (1959) intercorre- 
lated 12 physiological change scores 
and 4 personality ratings. The physi- 
ological measures were obtained be- 
fore and after the subjects were 
subjected to a standardized situa- 
tional stress. Correlations were in 
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general low among the physiological 
change scores, and the results of a 
factor analysis were not easy to 
interpret. 

There are just not enough studies 
with enough significant correlations 
between change scores to attempt 
any generalizations from the results. 
A general problem encountered in 
working with autonomicchangescores 
is with respect to the type of trans- 
formation, if any, to use. Correla- 
tions, for example, among Lacey’s 
(1956) autonomic lability scores 
would appear to provide a more 
meaningful picture of the tendency of 
measures to covary than would be 
obtained by using absolute change, 
percentage change, or most other 
transformations, since as previously 
mentioned Lacey’s score more ade- 
quately partials out the usual high 
negative correlation between change 
and initial level. The degree to which 
correlations among autonomic change 
scores can be affected by partialing 
out the correlation with initial level 
is shown in the results of Mandler 
and Kremen (1958). They intercor- 
related scores obtained under a failure 
stress condition from five different 
response systems (GSR, heart rate, 
respiration, face temperature, and 
blood volume) including in some cases 
absolute change scores along with 
Lacey's autonomic lability score. 
Absolute heart rate change yielded a 
correlation of .27 with change in 
respiration rate, whereas heart rate 
with initial level partialed out yielded 
a correlation of —.17; or in another 
case absolute heart rate change cor- 
related only .02 with inspiration 
amplitude (with initial level of inspi- 
ration amplitude partialed out) but 
heart rate with initial level partialed 
out correlated .31 with the same 
measure. It is clear that correlations 
among autonomic measures will be 
greatly affected by the way in which 
the relation to initial level is handled. 
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The findings of Lacey (1950), 
Lacey and Van Lehn (1952), Lacey, 
Bateman, and Van Lehn (1953), even 
though based on stressors that for the 
most part cannot be accepted as 
clearly anxiety arousing, provide such 
a strong argument for individual pat- 
terns of autonomic response that they 
should not be ignored in this context. 
Using various samples (college stu- 
dents and mothers of children in the 
Fel’s longitudinal research program) 
and various stressors (cold pressor 
test, hyperventilation, mental-arith- 
metic, and word fluency), Lacey et al. 
(1953) find that different subjects 
have different patterns of autonomic 
response which are reproducible over 
time and are consistent over these 
different stressors. Thus one subject 
may respond to the stress by a large 
increase in heart rate and only a small 
increase in skin conductance and an- 
other may respond with the opposite 
pattern. To the extent that such find- 
ings can be generalized to a clearly 
fear arousing situation the conclusion 
is clear that one cannot expect inter- 
correlations among autonomic change 
scores to be very substantial. The 
point to be emphasized here, how- 
ever, is not that several autonomic 
measures might not for almost all 
people increase under anxiety arous- 
ing circumstances, but that those 
measures which show the most or 
least increase vary from person to 
person. Such a state of affairs is not 
necessarily disasterous to one inter- 
ested in using physiological measures 
in assessing anxiety. The moral, how- 
ever, remains clear that for a given 
individual some physiological meas- 
ures may be much more sensitive 
indicators of change in anxiety level 
than others. 

_ A somewhat similar point of view 
is espoused by Malmo, Shagass, and 
Davis (1950) in which they propose 
the principle of symptom specificity: 
namely, that psychiatric patients 
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are inclined to respond to stress of all 
kinds by a particular physiological 
mechanism that leads to the par- 
ticular kind of somatic complaint 
that the patient may have. Thus, 
Malmo and Shagass (1949b) found 
that patients with heart complaints 
showed greater heart rate and heart 
rate variability under stress than 
patients without heart complaints. 
Specificity of muscle potential reac- 
tion was demonstrated by Malmo, 
Smith, and Kohlmeyer (1956) who 
showed that for the same patient 
discussion of hostility conflicts was 
associated with increased forearm 
muscle tension and discussion of sex 
conflict was associated with increased 
leg muscle tension. 

There are other studies in which 
intercorrelations among a number of 
physiological measures are reported, 
such as Wenger (1942, 1948) or 
Gunderson (1953) in which all meas- 
ures were obtained under resting 
conditions. If people manifest vary- 
ing degrees of an autonomic response 
pattern determined by the amount of 
anxiety that they “bring into” the 
resting situation then such a pattern 
should show up as a cluster of inter- 
correlated variables. Wenger’s (1942) 
earlier factor analytic work with 
children did yield a dimension that he 
called the autonomic factor, which 
when unbalanced in the sympathetic 
direction would appear to be similar 
to the cluster of autonomic measures 
associated with experimentally 
aroused anxiety in the previously 
described studies. | However, in 
Wenger’s (1948) study of aviation 
cadets, operational fatigue patients, 
and neurotic patients the case for a 
clear-cut autonomic factor is shaky. 
The most striking thing about the 
reported intercorrelations is their 
extremely low level. Very few corre- 
lations are higher than .15. Gunder- 
Son (1953), however, reported inter- 
Cotrelations among his 12 resting 
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state measures on a subsample of 
44 paranoid schizophrenics that were 
both substantial, for this kind of 
data, and pervasive. There was a 
tendency for many of the different 
autonomic measures to correlate be- 
tween .20 and .45 in a direction con- 
sistent with degree of sympathetic 
arousal. 

In summary, _ intercorrelations 
among physiological measures ob- 
tained under either resting states or 
under stress tend to be low and fre- 
quently insignificant. There are few 
studies, however, in which a variety 
of measures are obtained under a 
clearly fear arousing situation and 
where the tendency of change scores 
to correlate with initial level has been 
partialed out. Improved measure- 
ment technique may also make some 
of the older studies somewhat obso- 
lete. Nevertheless, the best guess on 
the basis of present findings is that 
intercorrelations among physiological 
measures will be found to be low even 
with the above-mentioned modifica- 
tions taken into account. Lacey's 
work suggests, consistent with the 
findings of low intercorrelations, that 
an individual responds to stress with 
a characteristic pattern of responses. 
This finding may not be entirely 
inconsistent with the possibility of 
there being some pattern of response 
usually associated with fear. For 
example, Lacey’s findings that sub- 
jects showed different response pat- 
terns to the stress of doing mental 
arithmetic may result in part from 
the fact that some subjects were made 
angry in the situation and some were 
made anxious, and that those that 
were made anxious showed a distinc- 
tive pattern from those made angry 
as Funkenstein et al. found. The 
chances are that this explanation 
does not account for all the individual 
response patterns, and it may be that 
among subjects made anxious there 
still remain different response pat- 
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terns. The meaning of these different 
response patterns, which could be 
few in number, may be clarified by 
further knowledge about their cor- 
relation with behavioral and perhaps 
self-report type measures. It is, of 
course, possible that future factor 
analytic or pattern analysis studies 
will suggest the utility of conceptual- 
izing several different kinds of anxiety 
states. 


Behavioral Measures: Experimental 
and Group Comparisons 


The same question is asked here as 
was asked with respect to physiologi- 
cal measures; is there some pattern of 
behavioral effects associated with 
anxiety that can be distinguished 
from behavioral effects resulting from 
other arousal states? The researches 
most relevant to the question are those 
in the general area of the effects of 
stress on performance. These re- 
searches, unfortunately, do not pro- 
vide a clear answer to the question 
because of two major lacks. First, 
most such studies tend to be limited 
to one dependent variable for the good 
reason that it is much more difficult 
to measure simultaneously a variety 
of appropriate behavioral responses 
than physiological responses. Second, 
few studies attempt to contrast a fear 
arousal state with other kinds of 
arousal states. Another general draw- 
back to most behavioral measures for 
the purposes of assessment, as will be 
shown in the studies reviewed, is that 
their relation to the anxiety con- 
struct is not a monotonic one; for 
example, a low score on a certain 
performance may be associated with 
a very low or very high state of anxi- 
ety. The studies mentioned below, 
then, can be seen as only suggestive 
of measures likely to be especially 
sensitive to the effects of anxiety, 
and are not intended to represent an 
extensive coverage of the research on 
the effects of stress on performance. 
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Summaries of research in this area 
are provided by Hanfmann (1950), 
Lazarus, Deese, and Osler (1952), — 
and more recently Easterbrook 
(1959). 

A loose empirical generalization 
that emerges from studies in this 
area is that the kinds of tasks most _ 
likely to be affected by stress are 
learning and memory tasks involving | 
novel or relatively poorly learned re- 
sponses where incorrect competing 
responses are both numerous and 
relatively strong; or perceptual tasks 
in which conditions are imposed that 
make appropriate discriminations dif- 
ficult. Thus, failure stress (usually 
produced by first ego involving, then 
failing, and then criticizing the sub- 
ject) has been shown to impair digit 
span but not vocabulary items (Mol- 
dowsky & Moldowsky, 1952); impair ) 
recall of incidental learning but not 
recall of material explicitly instructed i 
to be learned (Aborn, 1953); and im- 
pair relearning of a serial list of non- 
sense syllables (Smith, 1954). Stress 
imposed by implying that the subject 
is neurotic or maladjusted on the 
basis of projective test responses has 
been found to impair performance on 
abstract reasoning, the Holsopple 
Concept Formation Test, and mirror $ 
tracing (Beier, 1951); and to produce 
more perseveration of incorrect re- 
sponses on the Luchins Water Jar 
Task (Cowen, 1952). 

A number of studies in which anxi- 
ety is introduced by separating sub- 
jects into high and low anxiety groups 
on the basis of the Taylor MAS (1953) 
provide evidence not only that the 
detrimental effect of anxiety becomes 
greater as the strength and number of | 
incorrect competing responses in- 
volved in the task increases, but also, — 
for the levels of anxiety involved, 4 
that performance is enhanced for the _ 
high anxiety subjects on some tasks — 
when the correct response is very 
dominant. The incorrect competing — 
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responses are usually introduced by 
increasing the similarity and some- 
times also by decreasing the associa- 
tion value of items in a serial learning 
task, or by both increasing intralist 
similarity and decreasing similarity 
between pairs in a paired associate 
learning task. Lucas (1952), Mon- 
tague (1953), Farber and Spence 
(1953), Lazarus, Deese, and Hamilton 
(1954), Taylor and Chapman (1955), 
Spence, Farber, and McFann (1956), 
and Spence, Taylor, and Ketchel 
(1956), all reported evidence for this 
relationship. The findings of greater 
ease of eyeblink conditioning in 
groups of high anxious as opposed to 
low anxious subjects (Spence & 
Farber, 1953; Spence & Taylor, 1951; 
Taylor, 1951) are also consistent with 
this general proposition. 

Lucas (1952) also studied the effect 
of experimentally induced failure 
upon performance as a function of the 
strength of the incorrect competing 
responses (manipulated by varying 
the number of duplications of conso- 
nants in a series of consonants being 
used in an immediate recall task). 
He found no main effect associated 
with number of duplications nor any 
interaction with four degrees of ex- 
perimentally induced failure. No 
other studies were found in which 
anxiety was induced experimentally 
and its effect upon performance 
studied where the strength of the in- 
correct responses was systematically 
varied within the confines of the 
same task. 

_ A few studies have made use of real 
life stress situations that probably 
meet the need for a really anxiety 
arousing condition better than the 
experimental procedures used in the 
other studies. Beam (1955) obtained 
measures before doctoral oral exami- 
nations and opening night perform- 
ances in plays as well as at a less 
Stressful period in the subject’s life, 
and found marked impairment in 
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learning a serial list of nonsense 
syllables, and an increase in palmar 
sweat and GSR conditioning rate 
under stress as compared to nonstress. 
Basowitz et al. (1955) reported a 
tendency for digit span to be im- 
paired for soldiers undergoing para- 
troop training as compared to a con- 
trol group, and Wright (1954) like- 
wise found impairment in digit span 
in patients confronted with the threat 
of surgery. 

One kind of behavioral measure 
that would appear promising from an 
assessment point of view is speech 
disturbance. Mahl (1956, 1959) has 
developed a system for reliably scor- 
ing speech disturbances of various 
kinds and has shown certain of these 
disturbances to be related to varia- 
tion in anxiety as assessed in psy- 
chotherapeutic interviews. Dibner 
(1956) has employed a similar meas- 
ure. 

In the perceptual area Postman 
and Bruner (1948) reported impair- 
ment in the tachistoscopic perception 
of three-word sentences under failure 
stress. Rosenbaum (1953) found 
greater stimulus generalization under 
strong shock than weak shock. 
Smock (1957) reported greater in- 
tolerance of ambiguity in a percep- 
tual task under stress than nonstress. 
Korchin and Basowitz (1954), and 
Moffitt and Stagner (1956) found in- 
creased perceptual closure during 
paratroop training and experimental 
threat, respectively. 

In studies using group comparisons 
Angyal (1948) found more impair- 
ment in the recognition of patterns 
of letters under brief exposure condi- 
tions in high anxiety patients than 
other patients. Krugman (1947) and 
Goldstone (1955) found the threshold 
for flicker fusion to occur at a lower 
frequency for anxious than non- 
anxious subjects. i 

Eriksen and Wechsler (1955) in- 
geniously attempted to separate the 
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effects of anxiety (shock induced) on 
response processes as opposed to 
sensory discrimination, and con- 
cluded that anxiety results in re- 
stricted and stereotyped response 
preferences but does not impair 
sensory discrimination. 

In the studies reviewed so far in 
this section the effect of stress has 
been in general to impair perform- 
ance. There are many studies, how- 
ever, in which improved performance 
is associated with stress. Thus, 
Steisel and Cohen (1951) and Truax 
and Martin (1957) found improved 
performance on simple arithmetic 
problems as a result of failure stress; 
and Spence (1957) found better re- 
call of words failed on an anagrams 
task than words successfully com- 
pleted. 

Likewise studies in which groups 
have been divided on the basis of 
self-report measures of general anxi- 
ety level also indicate that failure 
stress can lead to improved perform- 
ance for some subjects. Thus Lucas 
(1952), Waterhouse and Child (1954), 
Williams (1955), and Sarason (1956) 
found that low anxiety subjects tend 
to improve under stress and high 
anxiety subjects tend to show im- 
pairment under stress. 

Thus, to the extent that failure 
stress arouses anxiety, this construct 
appears to be associated with both 
improvement and impairment of 
performance. These seemingly con- 
tradictory findings are in part recon- 
ciled in a study by Stennett (1957), 
who instead of employing just one 
stress and one nonstress condition 
attempted to set up four degrees of 
intensity of motivation. He found 
that tracking performance improved 
at first as the rewards for correct 
performance increased but then 
showed impairment under the most 
extreme condition involving a large 
bonus for high level performance and 
threat of electric shock if this level 
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was not reached. He also obtained 
palmar conductance and muscle po- 
tential measures on his subjects and 
found these measures to increase 
monotonically as a function of in- 
creased “motivation.” Several au- 
thors, consistent with this study and 
the others previously described, have 
proposed that adequacy of perform- 
ance is an inverted U shaped func- 
tion of some arousal, activation, or 
emotional state—for example, Wood- 
worth and Schlosberg (1954), Malmo 
(1957). 

Thus there appear to be two rather 
loose empirical generalizations that 
can be reached on the basis of the 
studies reviewed in this section: 
(a) that tasks involving relatively 
stronger and more numerous com pet- 
ing responses are more subject to the 
impairing effects of stress, and (b) 
increasing stress results in improved 
performance up to a point and im- 
pairment thereafter. There is no 
particular evidence in this area to 
warrant the separation of anxiety as 
a construct from other more general 
constructs such as “arousal,” “acti- 
vation,” or “drive.” 

Somewhat differing theoretical for- 
mulations have been proposed to 
account for the empirical generaliza- 
tions described. Easterbrook (1959) 
makes a plausible case for the idea 
that many of the disorganizing effects 
of emotion can be accounted for on 
the basis of cue utilization: namely, 
that increased “drive” or “emotion” 
leads to a constriction of the percep- 
tual field or decrease in the number of 
cues that can be attended to. The 
Iowa theorists, on the other hand 
(Spence, 1958), employ the concept 
of drive and its hypothesized multi- 
plicative relationship to habit 
strength to account for many of the 
effects of stress on performance; and 
Child (1954), Child and Waterhouse 
(1953), and Sarason, Mandler, and 
Craighill (1952) emphasize the ir- 


relevant competing responses specifi- 
cally associated with stress on the 
basis of the past learning. 

If anxiety proves to be a distin- 
guishable arousal state, research on 
its effects on performance would be 
greatly facilitated if it could be 
assessed independently, perhaps by 
physiological measures, from the 
performance being studied. The 
utility of this approach is shown in 
Stennett’s study, where it was not 
necessary to assume that experi- 
mental conditions were effective, or 
to rely upon some paper and pencil 
measure in determining the presence 
or magnitude of the motivational or 
emotional arousal state, but where 
instead the palmar conductance and 
muscle potential measures provided 
more direct evidence of the degree of 
arousal. 

In summary, no studies were dis- 
covered in which several objectively 
measured behavioral characteristics 
were obtained simultaneously (or al- 
most so) with a variety of physiologi- 
cal measures under conditions likely 
to be very fear arousing; much less, 
studies that in addition contrasted 
different types of arousal states. On 
the basis of the one and two variable 
type studies, though, it seems likely 
that some fairly simple learning, im- 
mediate memory, or perceptual tasks 
could be developed that would be 
sensitive to changes in anxiety level. 
It is possible that a few such tasks 
along with physiological measures 
could in the future help define more 
clearly the anxiety response pattern. 
Although, in general, improved meth- 
ods of continuous anxiety measure- 
ment will probably contribute more 
to the study of the effects of anxiety 
on behavior than vice versa. 


Behavioral Measures: Intercorrela- 


tions 


Studies oriented toward assessing 
the intercorrelations among a num- 
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ber of behavioral manifestations of 
anxiety are beset by a special prob- 
lem. Physiological measures can 
usually be obtained simultaneously 
but many behavioral effects of anxiety 
can be assessed only by presenting 
the subject with a series of tasks to 
perform. Unknown order effects may 
well distort the obtained correlations. 

There have been several studies of 
this type in which a number of be- 
havioral measures, selected on the 
basis of previously reported relation- 
ships to anxiety, were intercorre- 
lated. Martin (1958, 1959) in two 
successive studies using college sub- 
jects, found the intercorrelations to 
be quite low, but a factor analysis 
still suggested the presence of a 
dimension that might be labeled 
anxiety. In the second study some of 
the measures that had the higher 
loadings on the anxiety factor were 
the Taylor MAS, .41; time to learn a 
complex (five choice) verbal maze, 
.40; errors in learning of paired as- 
sociate nonsense syllables with high 
intralist similarity but low similarity 
between pairs, .39; tremors on a 
manual dexterity task, .39; an anxi- 
ety check list, .27. A simple verbal 
maze (two choice) and a paired as- 
sociate list involving low intralist 
similarity and high similarity be- 
tween pairs had zero order loadings 
on the factor. The loadings with re- 
spect to the two kinds of paired as- 
sociate lists and the two kinds of 
verbal mazes are consistent with the 
notion that tasks involving stronger 
competing responses are more sensi- 
tive to the effects of anxiety. A some- 
what more prominent factor that 
also emerged in both studies was 
interpreted as a motivational factor, 
that is, a dimension reflecting how 
hard these college subjects tried on a 
number of the tasks. Such individual 
differences in motivation were postu- 
lated to be relatively independent of 
the subjects’ anxiety level. A third 
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factor of some generality was identi- 
fied as intelligence, and yet another 
factor was entirely defined by self- 
report measures of anxiety such as 
the Taylor MAS. Thus performance 
on a given task such as learning 
paired associate nonsense syllables 
with high intralist similarity under 
mild stress was found not only to be 
affected by individual differences in 
anxiety but also by individual differ- 
ences in motivation, intelligence, and 
a factor specific to the type of task. 
Under these circumstances it is easy 
to see how anxiety variance could 
frequently be masked by other fac- 
tors. 

Rosenthal (1955), Cattell and 
Gruen (1955), and Scheier and Cat- 
tell (1958) reported several factor 
analytic studies in which a variety of 
self-report, behavioral, and, in some 
cases, physiological measures were 
obtained. They founda factor, which 
they label anxiety, emerging in all 
their studies that is separable from a 
number of other personality factors 
after relatively blind rotations to 
oblique simple structure. The above 
studies employed substantial Ns in 
five different samples of subjects in- 
volving USAF pilot trainees, chil- 
dren, and college students. Upon 
inspection of the factor loadings on 
the anxiety factor in these various 
studies as summarized by Cattell 
and Scheier (1958b) it becomes ap- 
parent, however, that the only meas- 
ures with high loadings and the only 
measures whose loadings are con- 
sistent from study to study are those 
based on self-report type measures. 
Few if any behavioral-physiological 
measures have loadings over .30 and 
none of those that do are substan- 
tiated in any of the other samples. 
For example, in Rosenthal’s study 
(1955) the three highest loadings on 
the anxiety factor were Taylor MAS, 
-85; questionnaire measure of anxious 


insecurity, .84; and a questionnaire 
measure of nervous tension, .70. The 
other four measures with loadings 
above .30 were also self-report type 
measures. Rosenthal obtained sev- 
eral physiological measures under 
various conditions (GSR, heart rate, 
salivary volume, systolic blood pres- 
sure) and none of these were related 
to this anxiety factor to any degree. 
Under these circumstances it does 
not seem reasonable to accept this 
factor as necessarily assessing the 
hypothetical anxiety reaction as for- 
mulated in this paper. 

Cattell and Scheier (1958b) distin- 
guish between the “trait” of anxiety, 
inferred from factor analysis of a 
cross section of measures obtained 
only once on each subject, and the 
“state” of anxiety inferred from a 
factor analysis of change scores from 
one testing time to another. Cor- 
relating change scores in this way is 
referred to as incremental R tech- 
nique, and Cattell and Scheier 
(1958a) report in detail the results of 
such a study. An interesting innova- 
tion in this study, too involved to go 
into in this paper, was the introduc- 
tion of different “treatment” condi- 
tions into a correlational study, so 
that it was possible to see, for ex- 
ample, how imminence of academic 
examinations correlated with the 
other variables. One of the resulting 
14 factors was identified as the 
“state” anxiety factor and appears 
to represent an arousal state more 
closely related to the present theo- 
retical view of anxiety than the previ- 
ously found trait factor. The self- 
report measures did not dominate 
the loadings so much, although the 
two highest loadings were self-report 
measures involving an anxiety-ten- 
sion check list, -41, and a question- 
naire scale of tension, .40. In addi- 
tion though, systolic blood pressure 
had a loading of .30 and palmar con- 
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ductance of .26. Perhaps inconsistent 
with this was the positive loading of 
volume of saliva, .27. The imminence 
of an academic examination was 
negatively loaded, —.25, suggesting 
that just before an examination the 
usually anxious person becomes less 
anxious. The authors propose that 
“a person beset by vague fears and 
anxieties loses these anxieties for a 
while when a real danger threatens.” 

Holtzman and Bitterman (1956) 
intercorrelated 41 measures obtained 
on 135 cadets in an Air ROTC unit. 
These measures included ratings, 
personality tests, stress tests, per- 
ceptual tests, GSR conditioning, and 
amount of uric acid and glycine in 
the urine. The intercorrelations 
among the different kinds of meas- 
ures were quite low and a factor 
analysis yielded seven factors which 
were almost entirely determined by 
clusters of measures taken from the 
same test situation. 

There are some important limita- 
tions to the factor analytic approach 
to the study of anxiety. For example, 
there is no convincing logic to the 
supposition that simple structure, 
oblique or orthogonal, yields the most 
psychologically meaningful dimen- 
sions; although intuitively it would 
seem that some kind of oblique solu- 
tion would be more meaningful for 
separating out a cluster of physiologi- 
cal-behavioral measures to be ideni- 
fied as anxiety as opposed to clusters 
of measures representing other arous- 
al states, since in all likelihood these 
various arousal states will be corre- 
lated. With respect to rotations in 
factor analytic studies perhaps it 
would be better if such rotations were 
not done blindly but with full knowl- 
edge of the nature of the measures, 
and the final rotation considered 
frankly for what it is, a post hoc 
hypothesis about the nature of the 
dimensions revealed. Confirmation 


of the interpretation of a given factor 
and further elucidation of the con- 
struct validity (Cronbach & Meehl, 
1955) of the assessment procedures 
can then be ascertained by introduc- 
ing the factor as a variable in experi- 
mental research. 

Certainly the selection of measures 
to be intercorrelated affects the defi- 
nition of the resulting factors. For 
example, it may be that in the Cattell 
studies just described, with the ex- 
ception of the incremental R tech- 
nique study, that the high intercorre- 
lations among the self-report meas- 
ures, which almost entirely define 
the anxiety factor, are due in part to 
correlated nonanxiety variance. It is 
also possible that many of the meas- 
ures used in the factor analytic 
studies involve characteristic ways of 
controlling or reducing anxiety rather 
than more direct manifestations of 
the anxiety itself. The Holtzman and 
Bitterman study serves to point up 
the fact that in an area where corre- 
lations between measures obtained 
from different response systems are 
going to be low at best, including 
clusters of highly intercorrelated 
measures from the same response 
system or test situation will inevi- 
tably result in factors representing 
these clusters, at least when the com- 
mon criteria for simple structure are 
employed. It is possible that a factor 
analysis done under such conditions 
might serve to actually hide some 
real generalities of response, although 
there is no indication that such was 
the case in the Holtzman and Bitter- 
man study. 

One cannot conclude on the basis 
of the researches reviewed in this 
paper, despite many suggestive leads, 
that any clear-cut pattern of physio- 
logical-behavioral responses associ- 
ated with anxiety arousal, distin- 
guishable from other arousal patterns 
has been demonstrated. The status 
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of anxiety assessment procedures, 
both in terms of experimental and 
correlational findings might be clari- 
fied by combining some of the best 
features of the researches described. 
First one might attempt to measure 
simultaneously, or nearly so, an ex- 
tensive battery of physiological meas- 
ures and a few selected behavioral 
measures at a time when the subject 
is relaxed. This would necessitate a 
preliminary adaptation-to-the-appa- 
ratus session. Then the subjects 
could be tested again under defi- 
nitely anxiety arousing circumstances, 
the more realistic the better. A 
study of the change score patterns 
and intercorrelations, after correcting 
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where necessary for correlation with 
relaxed session levels, should provide 
evidence for an anxiety pattern if it 
exists. It would then be further 
necessary to demonstrate that the 
pattern of responses was distinguish- 
able from patterns associated with 
other arousal states such as general 
activation, anger, or sex; otherwise 
there is no utility in having a con- 
struct of anxiety separate from these 
others. 

When more is known about the 
physiological-behavioral response 
pattern associated with anxiety, then 
self-report scales can be constructed 
which will predict this response pat- 
tern in various situations. 
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THE SECOND FACET OF FORGETTING: 
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The interference theory of forget- 
ting assumes that the extraexperi- 
mental occurrences of S-R sequences, 
either before original learning of goal 
responses? or interpolated between 
original learning and recall, will in- 
duce a decrement at recall if their 
stimuli are the same or similar to 
those of the criterion task and re- 
sponses are antagonistic. Thus, the 
laws of forgetting reduce to the laws 
of proactive and retroactive inhibi- 
tion (Briggs, 1957; Bugelski & Cad- 
wallader, 1956; Osgood, 1949), with 
experimental extinction as the proc- 
ess whereby responses are weakened 
in interference paradigms (Adams, 
1952a; Briggs, 1954; Underwood, 
1948a, 1948b; Underwood & Post- 
man, 1960). More recently, Under- 
wood (1957) has shown the potency 
of proactive inhibition on the recall 
of verbal responses by demonstrating 
that the prior learning of verbal ma- 
terials has led us to greatly over- 
estimate the amount forgotten. Un- 
derwood’s 1957 study, combined 


* Several psychologists read a draft copy of 
this paper and improved it with their thought- 
ful commentary. Acknowledgement is due to 
A. M. Barch, R. C. Davis, M. R. Denny, 
J. M. Digman, C. P. Duncan, J. C. Jahnke, 
and B. J. Underwood. 

* The term “goal response” is used through- 
Cut this paper as synonymous with “test 
response” or “‘criterial response.” It is a fea- 
ture of overt behavior which the experimenter 
records and uses as the dependent variable. 


with recent research by Underwood 
and Postman (1960) showing effects 
on verbal recall from expected sources 
of verbal interference outside the 
laboratory, have materially strength- 
ened the interference theory of for- 
getting. Additional evidence for the 
interference theory has been by 
Steinberg and Summerfield (1957) 
and Summerfield and Steinberg (1957, 
1959) who used nitrous oxide in the 
control of learned associations during 
interpolated rest. Osgood (1953, pp. 
593-597) presents a good review of 
research in support of the inter- 
ference theory where various tech- 
niques are used to control activities 
of the organism during the retention 
interval as a means of reducing op- 
portunities for learning competing 
responses. Other explanations of 
normal forgetting might eventually 
be shown to have validity also, but 
the prepondigjpce of contemporary 
evidence lies in support of the inter- 
ference theory and it will be used with- 
out further qualifications through- 
out this paper as the mechanism by 
which goal responses are directly in- 
fluenced and weakened during a re- 
tention interval. 

The purpose of this paper is to re- 
view evidence for the view that warm- 
up decrement (WU) is a second por- 
tion of the retention loss, arising 
from conditions other than direct in- 
terference with goal responses. Irion 
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(1948) points out that the very spe- 
cial circumstances of stimulus and 
response similarity required for in- 
terference, along with the amount of 
interfering activity required to pro- 
duce decrement in the originally 
learned responses, make it unlikely 
that the fortuitous experiences of 
everyday life outside the laboratory 
could induce significant amounts of 
one-factor forgetting. While the work 
of Underwood and Postman (1960) 
suggests that casual interference is a 
factor to be reckoned with, there is 
the strong prevailing sentiment in ex- 
perimental psychology, supported by 
research evidence, that hypothesizes 
WU as a second part of forgetting in- 
dependent of direct interference with 
the goal responses. As we shall see, 
the support for this two-factor view 
is not as secure as it might be. 


JACK A. 


HISTORICAL BACKGROUND AND 
DEFINITIONS 


The first systematic observations 
on WU arose from interest in fatigue 
and the characteristics of perform- 
ance curves under conditions of pro- 
tracted work, and they appear to 
have been made in the latter part of 
the 19th century by Kraepelin and 
his students (Arai, 1912). Studying a 
variety of tasks, these researchers ob- 
served that the initial segment of a 
performance curve w ified by a 
rapid rise in efficiency, followed bya 
much slower rate of increase or a de- 
cline when fatigue effects were pres- 
ent. They identified this initial 
rapid rise as WU, although in some 
cases it could have been considered a 
practice effect or simple reacquisition 
following one-factor forgetting. In- 
terestingly, these early investigators 
made an observation which enters the 
thinking of many later workers: that 
a rest period contains the simultane- 
ous and opposing processes of bene- 
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ficial recovery from decremental work 
effects, and loss of advantageous fac- 
tors whose reinstatement occurs dur- 
ing the warming-up period. 

Mosso (1906) reported anecdotal 

accounts on the need for poets and 
writers to warm-up before a period 
of productive work could begin. 
Wells (1908) observed the rapid in- 
crease in initial postrest performance 
on a tapping test which, by this time, 
generally had become identified as 
WU. Thorndike (1914) in a chapter 
“Mental Work and Fatigue” gives a 
more careful definition than previous 
investigators: 
The best definition of “warming-up” as an 
an objective act is that part of an increase of 
efficiency during the first 20 minutes (or some 
other assigned early portion) of a work period, 
which is abolished by a moderate rest, say of 
60 minutes (p. 66). 


One other quotation from Thorndike 
is particularly significant: 

It should also be noted that intellectual warm- 
ing-up in the popular sense refers rather to 
fore-exercise of other functions, in order to get 
materials and motives with which and by 
which the given function is to work, than to 
an intrinsic alteration of it (pp. 67-68). 


Thorndike’s definition of WU as a 
rapid increase of efficiency during 
the initial postrest period is con- 
sistent with that of earlier writers. 
Thorndike, in these quotations, 
makes the influential observation 
that the WU segment is something 
other than strengthening of goal re- 
sponses with practice, and he clearly 
makes this point in the second quota- 
tion (pp. 67-68) when he identifies 
intellectual WU as the fore-exercise 
of other functions. It is this identifi- 
cation of WU with factors support- 
ing goal responses which stands as 
the foundation of the two-factor 
theory of forgetting, and apparently 
Thorndike was the first to make it. 
Thorndike’s observations stand as 
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the most important historical pred- 
ecessors of contemporary views, but 
other investigators made observa- 
tions on warm-up too, with an occa- 
sional experiment. Watson (1919, 
pp. 354-355) assumed that WU ap- 
peared only for heavy muscular work 
and that the warming-up period was 
a time of increased glandular action. 
Robinson and Heron (1924) defined 
WU as “a rise in efficiency which is 
steeper and more temporary than the 
rise which can be seen, let us say, in 
successive daily performances” (p. 
81). Robinson (1934) essentially re- 
peated his 1924 views. Snoddy (1935) 
presented the first data from a rela- 
tively large group of subjects which 
showed WU following rest. He em- 
ployed a mirror-tracing instrument 
as his experimental device. Bell 
(1942) performed an experiment on 
the Rotary Pursuit Test (Melton, 
1947) on the effects of varying 
amounts of rest interpolated early 
and late in practice. Warm-up dec- 
rement, as measured by the differ- 
ence between the first and second 
postrest trial, was found to first in- 
crease and then decrease with 
amounts of interpolated rest rang- 
ing from 1 minute to 30 hours. This 
trend applied to both early and late 
in practice. 

Post-World War II research dis- 
played an accelerated interest in WU 
and produced more careful defini- 
tions, hypotheses concerning its un- 
derlying nature, and specific experi- 
mental tests, The modern investi- 
gators generally followed the leads of 
their predecessors. Ammons (1947a), 
in a miniature system of variables de- 
termining rotary pursuit perform- 
ance, measured WU as the difference 
between the score on the first post- 
rest trial and a point on the perform- 
ance curve estimated as the level 
that would have occurred had there 
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been no need for warming-up. Irion 
(1948, p. 338) defines WU on the re- 
sponse side in terms of the greater 
slope of the initial segment of the 
postrest curve relative to the slope 
of the original learning curve at a 
corresponding level of initial profi- 
ciency. The response definitions by 
Ammons and by Irion amount to 
about the same thing and, along with 
their theoretical views to be dis- 
cussed subsequently, have been the 
mainstay of most workers in the area 
since the war. A significant feature 
of these definitions is that they do 
not imply an actual decrement from 
the last prerest trial to the first post- 
rest trial and, in this sense, the com- 
mon reference to “decrement” is a 
misnomer. Consistent with most 
early observations on WU, the defi- 
nitions involve an expression of the 
sharp initial rise in a postrest per- 
formance curve and are independent 
of whether there is an overall gain 
or a loss over rest. It is a decrement 
only in the sense that initial postrest 
performance is below an expected 
level because of WU, and this ex- 
pected level is not always below the 
level on the final postrest trial. This 
is the interaction of work and WU 
effects over rest which drew the at- 
tention of Kraepelin and his associ- 
ates (Arai, 1912). Figure 1 illus- 
trates WU anid its appearance under 
conditions of massed and distributed 
practice (from Adams, 1952b). The 
Rotary Pursuit Test was used and 5 
days of practice were administered, 
with 36 ten-second trials given each 
day. Massed practice was 6 minutes 
of continuous practice, and distrib- 
uted practice had a 40-second inter- 
trial rest interval. Eighteen subjects 
were in the massed group and 21 in 
the distributed group. These data 
are a good example of WU manifesta- 
tions, although a subsequent section 
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Fic. 1, Illustrations of warm-up decrement under conditions of massed and distributed 
practice on the Rotary Pursuit Test. (From Adams, 1952b) 


will point out that motor WU has a 
different status at this time than 
verbal WU. The massed group shows 
several instances of reminiscence 
from the final prerest trial to the first 
postrest trial, but the steep, initial 
rise in each postrest segment is taken 
to be WU resulting from a decre- 
mental process opposing the gain 
over rest. Adams measured WU as 
the difference between the first post- 
rest trial and the score on the trial at 
the peak of the rise before the de- 
cremental segment begins. For the 
distributed group, however, the 
method of WU measurement can be 
the same or it can be measured as a 
decrement from the last prerest trial 
to the first postrest trial because 
reminiscence is absent (Barch, 1954; 
Reynolds & Adams, 1954). Being 
able to measure it as an actual decre- 
ment is somewhat more precise be- 
cause it does not involve judgments 
of the termination point of the WU 
segment. Digman (1959) replicated 
Adams’ study in most of its aspects 
and obtained the same trends. 


EXPLANATORY HYPOTHESES 
Set 


WU must be defined in terms of 
operations independent of those for a 
one-factor forgetting interpretation 
which, for the interference hypothesis 
of forgetting, would be in terms of 
responses conflicting with goal re- 
sponses and causing extinction of 
them. Considering WU as a perform- 
ance level below that expected at the 
beginning of a postrest practice ses- 
sion, it is just as meaningful to regard 
it as a simple one-factor forgetting 
loss for the goal response, with WU 
being a completely superfluous notion. 
With the exception of Doré and Hil- 
gard (1938) and Hilgard and Smith 
(1942), pre-World War II investiga- 
tors demonstrated a lack of methodo- 
logical caution by simply assuming 
WU as a phenomenon separate from 
one-factor forgetting. The Iowa 
studies of psychomotor interference 
in the postwar era (Lewis & McAl- 
lister, 1950; Lewis, McAllister, & 
Adams, 1951; Lewis, Shephard, & 
Adams, 1949; Lewis, Smith, & Mc- 


ter, 1952; Shephard, 1950; Shep- 
& Lewis, 1950), exhibited a simi- 
conservatism by suggesting an in- 
etation of WU consistent with 
one-factor interference theory of 
etting. They held that the learn- 
of responses in a laboratory task 
volves the extinction of conflicting 
responses either from prior tasks 
learned in the laboratory or from ex- 
tralaboratory tasks. When a rest 
period is introduced the extinguished 
responses spontaneously recover some 
of their strength and, when postrest 
practice is resumed, the increased 
Strength of these responses results in 
heightened conflict with the goal re- 
sponses and WU occurs. As postrest 
practice continues the conflicting re- 
Sponses are once again extinguished 
and WU dissipates. While these one- 
factor views are parsimonious, and 
are therefore desirable, the parsimony 
may be unwarranted. The “other 
functions” which Thorndike (1914) 
identified with WU hypothesizes that 
_ @ one-factor interpretation is an in- 
Sufficient explanation of WU, and 
Thorndike’s early view is given a more 
explicit, and testable, expression in 
the set hypothesis of WU. In the 

postwar era Irion (1948) gave the first 
Operationally independent statement 
of W U in terms of a set state of the 
subject, and this definition was dis- 
tinct from a one-factor forgetting 
definition. The term “set” has a 
number of meanings in psychology 
(Gibson, 1941) but Irion provided a 
sufficiently sound operational defini- 
tion of set within the WU context to 
Provide testable predictions. Al- 
ough inhibition hypotheses of WU 
ve been attempted, and will be dis- 
Cussed, the set hypothesis has the 

Most status and has been the frame- 
Work for most of the systematic re- 
Search on WU. 
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If set is to be objectively assessed 
for its utility in the scientific descrip- 
tion of behavior, it must be defined 
in terms of manipulable environ- 
mental events, on the one hand, and 
objective measures of behavior on 
the other. Furthermore, its opera- 
tional definition on the environ- 
mental side must be different from 
those defining other behavioral proc- 
esses which, for our present purposes, 
is the differentiation of set and one- 
factor forgetting variables, The in- 
dependence of defining operations is 
critical for testing a two-factor theory 
of forgetting even though forgetting 
and set loss exert highly similar 
effects on dependent response meas- 
ures. Just as long as one-factor for- 
getting is defined by the retroactive 
and proactive inhibition paradigm 
with experimental extinction as the 
process, and WU is defined by other 
operations related to a different proc- 
ess such as set, they both can be re- 
tained for the description of behavior 
because they can be independently 
manipulated and measured. This 
would be the justification for a scien- 
tifically sound two-factor theory of 
forgetting. Irion’s paper (1948) made 
the two-factor distinction for verbal 
learning and consequently has given 
a basis for the objective assessment 
of set as a determiner of behavior. 
The use of set with respect to motor 
behavior has not been grounded in 
definitions as clear as those for verbal 
behavior, but this will be discussed 
later. 

Irion’s conception of set has much 
in common with those of Bell (1942) 
and Ammons (1947a) for motor 
learning where set is considered to be 
an aggregate of postural and atten- 
tive adjustments which are posi- 
tively related to performance of the 
goal response. Complex perform- 
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ance, such as the learning of a verbal 
list, involves more than the external 
goal stimuli which the experimenter 
has objectively defined and controls, 
and to which the subject links the 
goal response measured by the experi- 
menter. In addition, various second- 
ary responses are learned, such as the 
orientation patterns for visual re- 
ceptors, proper postural attitudes, 
and muscular tensions. These re- 
sponses are secondary mainly in the 
sense of not being directly measured 
but the efficiency of the goal re- 
sponding is intimately linked to them. 
Irion hypothesizes that these sec- 
ondary responses, or set, are dis- 
turbed by the subject's activities be- 
tween original learning and recall 
and this loss of set is the underlying 
cause for the steep slope of the initial 
segment of the relearning curve 
which is called WU. The disruption 
of set could operate to induce the dec- 
rement in retention in at least three 
ways: (a) failure of the receptors to 
adequately receive the goal stimuli, 
(6) mechanical inefficiency for opti- 
mum goal responding because the 
subject does not have the proper pos- 
ture or muscular tension patterns, 
and (c) change in the internal stimu- 
lation which is part of the stimulus 
complex to which goal responses are 
conditioned (Guthrie, 1952). This 
third possible cause of WU could be 
a function of one or both of the first 
two because if the secondary re- 
sponses are disturbed, then their pat- 
terns of response-produced stimula- 
tion change and the performance 
level of the goal responses condi- 
tioned to these internal cues is re- 
duced. 

While set is disrupted by activities 
during the rest interval, and thus is a 
kind of interference theory, the hy- 
pothesis is distinguished from the one- 
factor interference theory of forget- 
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ting by emphasizing the role 
nongoal, secondary responses and, 
importantly, by specifying that these 
secondary responses are a function of 
operations different from those defines 
ing the strength of goal response 
The interference theory is com 
cerned with practice variables which ~ 
strengthen goal S-R sequences and ine” 
crease their resistance to forgettin 
and interfering S-R sequences which 
weaken goal S-R sequences by exe _ 
perimental extinction. Set, on the 
other hand, is strengthened by pers 
formance of S-R sequences that are 
neutral with respect to S-R goal se- 
quences and which overcome WU 
by restrengthening secondary res 
sponses—not the strength of goal 
S-R sequences. While it is true that 
practice of goal responses appears to 
strengthen set, as the elimination of — 
WU in relearning testifies, this is only 
because the goal responses are en- 
meshed in a matrix of secondary re- 
sponses and their practice is concur- 
rently accompanied by the practice 
of secondary responses. However, 
the task elements which define the 
learning problem for secondary res 
sponses can be embodied in a sep- 
arate neutral task and can be used 
to strengthen set independently of 
goal response practicing. The weak- 
ening of set during the retention in- 
terval is also presumed to be by in- 
terfering activities neutral to goal r 
sponses, but their characteristics are — 
unspecified at this time. It might be 
presumed, for example, that general ` 
body movements would disrupt the ` 
particular postures and muscular ten- 
sion patterns acquired in the crite- 
rion task. } 
Of course there may be nothing to 
the set hypothesis because all reten- 
tion loss could be one-factor forget- 
ting in terms of direct effects on go 
responses. But even given the gener: 
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terms within which the hypothesis is 
stated, the scientific criteria are 
broadly met for verifying that a por- 
tion of the retention loss can be 
ascribed to something other than 
one-factor forgetting, and it should 
be possible to find neutral tasks 
whose performance in a retention 
interval would reinstate set and 
abolish WU but would not yield 
habit strength increments for goal 
responses. Furthermore, if set is a 
determiner of performance as Irion 
says, practice on a neutral task 
should enhance performance on a 
criterion task before original learn- 
ing by strengthening advantageous 
secondary responses. Effects on re- 
call and original learning will be 
treated separately in the sections 
that follow. 


Verbal Behavior 


Recall. Taking cue from Ward's 
experiment (Ward, 1937) where the 
subject’s association of colors during 
rest benefited verbal recall, Irion 
(1949b) tested the set hypothesis by 
using one trial of a neutral color-nam- 
ing task as a warming-up activity 
just before the recall of paired adjec- 
tives after 24 hours. The subject was 
not required to memorize colors but 
only to name them as they appeared 
in the window of a memory drum. 
Color-naming, then, did not in any 
way involve practice of goal re- 
sponses but did serve to reorient the 
subject to the rhythm of responding 
and direct his visual attending, pos- 
ture, and physical adjustments in a 
manner very similar to that required 
in the criterion task and should func- 
tion to restore the subject’s set to 
respond. Irion found that a rest con- 
trol group which had conventional 
recall after the 24-hour interval dis- 
Played a significant performance 
loss, but the color-naming group on 


the first recall trial was significantly 
superior to the rest control group, 
and not different from a no-rest con- 
trol group. This is in good accord 
with the set hypothesis. In re-estab- 
lishing performance in the relearning 
trials, Irion found the one trial of 
color-naming essentially equivalent 
to one trial of practice on the crite- 
rion task. A related study reported in 
the same paper demonstrated that 
first trial recall was a decreasing func- 
tion of the length of the rest interval 
up to 24 hours and that the slopes 
of the relearning curves were a func- 
tion of the length of the rest interval. 
Irion interpreted this experiment as 
being in accord with the set hy- 
pothesis and the definition of WU in 
terms of the slope of the postrest 
performance curve. Since his color- 
naming experiment demonstrated 
that retention loss occurring after 24 
hours could be eliminated by one 
trial of warming-up activity, it seems 
safe to assume that this decreasing 
recall function shows increasing WU 
and loss of set over interpolated time. 

Irion and Wham (1951) tested an 
implication of the set hypothesis that 
WU should be a decreasing function 
of the amount of set-reinstating ac- 
tivity. The criterion task was serial 
rote learning of nonsense syllables 
and the warming-up activity was rec- 
itation of three-place numbers. The 
retention interval was 35 minutes. 
Warming-up had a significant effect 
on the first recall trial, with perform- 
ance level being a positive function 
of the number-naming trials. And, 
rate of increase of the initial WU seg- 
ment of the relearning curves tended 
to be inversely related to the amount 
of warming-up. This study extends 
Irion’s earlier work and represents 
good support for the set version of 


WU. 
One interpretation that could be 
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independent variables are manipulat- 
ing WU through changes in set. 
Irion (1949a) performed similar re- 
search with the Rotary Pursuit Test 
and related WU to the amount of pre- 
rest practice and the duration of rest. 
His paper reflects the same method- 
ological problem. 

Efforts to locate a neutral task 
which would influence WU in the re- 
call of motor performance, as well as 
the level of original learning, have 
met with failure. The most thorough 
experimental attempt was by Am- 
mons (1951) using the Rotary Pur- 
suit Test. His subjects were admin- 
istered initial practice, rest, and a 
postrest practice session. Set-rein- 
stating activities were either watch- 
ing the disk, blindfolded manual 
performance of the rotary motion by 
holding a small rivet set in the rotor 
plate, or imaginary practice where 
the subject was merely to think 
about practicing. These activities 
were administered either before ini- 
tial practice, before the postrest 
practice session, or before both prac- 
tice periods. No effects were observed 
from any of the experimental treat- 
ments. Walker, DeSoto, and Shelly 
(1957) performed a bilateral transfer 
experiment on the Rotary Pursuit 
Test. Original practice was with one 
hand and, following rest, practice 
was resumed with the other hand. 
One of the experimental conditions 
was to have one trial of practice just 
before the postrest session with the 
prerest practice hand to see if it could 
have a warming-up effect on the 
transfer hand. WU was found in per- 
formance on the transfer hand but it 
was unaffected by the warming-up 
procedure and they concluded that 
WU must be quite specific to an effec- 
tor. Hamilton and Mola (1953) used 
a finger maze and evaluated the effect 

of practice on five different mazes on 
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performance in a criterion m 
They used an experimental di 


warming-up mazes either 24 ho 
immediately preceding the test maze 
Positive transfer to the criterion 
maze was found but it was about 
same for both warm-up groups 


ing-up properties. A small enco 
aging sign to counter this negative 
evidence is found in a study 


period one group was required 
observe a partner's performance and 
press a button every time he judged 
him to be on target. This activity 
produced work inhibition but it also” 
tended to result in less WU than 
found for control groups. 3 
duced WU was a secondary finding 
in a study on another topic but it isa 
lead on a likely set-reinstating activ- 
ity for motor performance. 


Negative Evidence 


Ordinarily the amount of evidence 
which has been cited in support of 
set hypothesis for verbal learni 
would be sufficient to give a good 
measure of security to the two-factor 
theory in psychology, but unfortu- 
nately there is a disconcerting number — 
of negative findings. In a careful 
effort to replicate Irion’s verbal 
learning study (1949b), Rockway and 
Duncan (1952) were unable to re- 
produce Irion’s results and show 
an effect of color-naming on re- 
call. Similarly, Withey, Buxton, and | 
Elkin (1949) and Hovland and Kurta 
(1951) failed to show an influence of 
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color-naming on verbal recall. Un- 
derwood (1952), studying the serial 
learning of nonsense syllables, was 
unable to demonstrate that the 
warming-up activity of number-nam- 
ing prior to recall produced an effect 
on WU after 24 hours of rest. Under- 
wood said that this did not necesarily 
contradict previous findings by Irion 
and others because earlier studies did 
not use the same subjects in several 
experimental conditions as he had 
done. Dinner and Duncan (1959) 
hypothesized that the unreliability of 
the effects of color-naming on verbal 
recall might be a function of degree of 
original learning. Using a low, me- 
dium, and high degree of original 
learning of paired adjectives, they 
found that color-naming influenced 
recall only when level of original 
learning was high. They concluded 
that Irion’s positive use of color- 
naming (Irion, 1949b) based on a 
medium degree of original learning 
should be considered sampling error 
and discounted. However, this judg- 
ment should be regarded with caution 
because it does not consider the works 
of Hartley (1948) and Irion and 
Wham (1951) who all obtained posi- 
tive effects of warming-up activities 
on verbal recall when low or inter- 
mediate levels of original learning 
were used. The Dinner and Duncan 
investigation makes an original con- 
tribution in showing an effect of the 
amount of original learning, but it 
cannot be reconciled at this time with 
verbal learning studies which effec- 
tively used warming-up activities to 
enhance recall. 

Another disturbing consideration 
for understanding WU and the set 
hypothesis is that there are tasks 
where performance in the initial post- 
Test trials does not show WU. The 
inverted alphabet printing task has 
enjoyed moderate popularity for the 
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study of work inhibition and the 
data never show WU (Archer, 1954; 
Eysenck, 1956;Kimble, 1949; Wasser- 
man, 1951). Silver (1952) used the 
inverted alphabet printing task to 
investigate the interaction of warm- 
up activity on WU and work inhibi- 
tion, but since he used performance 
on the first postrest trial for his com- 
parisons there is no reason to believe 
that he was manipulating WU or, 
indeed whether his data even dis- 
played WU. Other investigators using 
inverted alphabet printing failed to 
show WU, and it is unlikely that 
WU as it has been defined in terms of 
a rapid increase in performance on 
the initial postrest trials was even 
present in Silver's data. Bilodeau 
(1952a, 1952b), investigated work in- 
hibition by manipulating the physical 
load required to turn a manual crank, 
and found no WU in his data. Doten 
(1955), in a study of interference, 
used a task where the subject was 
presented the printed names of 
colors, but where the color of the let- 
ters in the name was different than 
indicated by the name. The word 
“Red” might be printed in blue, for 
example. The task of the subject in 
original learning was to respond by 
stating the actual color of the letter- 
ing, and no WU was indicated. 
The initial segments of performance 
curves on each day had an immedi- 
ate decrease in speed of responding, 
not the rapid increase which is char- 
acteristic of WU segments. Tasks 
such as these raise serious definitional 
problems for set, or any other 

hypothesis for that matter. We can- 
not expect any explanatory hypothe- 
sis to enjoy a good degree of success 
until the tasks in which WU occurs 
have been established. Ideally the 
set hypothesis should contain state- 
ments relating WU and task charac- 


teristics. 
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Inhibition 

The principal advocate of an inhi- 
bition hypothesis is Eysenck (1956) 
who interprets WU as mainly being 
attributable to the extinction of 
Hull’s conditioned inhibition (Hull, 
1943). Eysenck does not completely 
deny the set hypothesis, but rather 
considers loss of set a lesser contribu- 
tor to WU, with the extinction of con- 
ditioned inhibition being the primary 
reason for the trend of the initial 
postrest segment. It will be recalled 
that Hull has a two-factor theory of 
inhibition. One construct, Jp, is an 
increasing function of the number of 
responses and amount of physical 
work, and a decreasing function of 
the rest interval. Moreover, Ip has 
drive properties and its dissipation is 
regarded as drive reduction. Since 
drive reduction in Hull’s system is 
the basis of reinforcement, an incre- 
ment of habit strength for the on- 
going response is accrued whenever 
Tx dissipates. Because the subject is 
resting when Ir is dissipating, it is 
theorized that a resting response is 
strengthened which is antagonistic 
to the goal response. The habit con- 
struct for the resting response is the 
second inhibitory factor, sIr. These 
two types of inhibition summate and 
subtract from the excitatory poten- 
tial (sEr) for the goal response to 
yield effective excitatory potential 
(sEr) which is the primary deter- 
miner of overt performance level. 
The massed group in Figure 1 can be 
used to illustrate Eysenck's applica- 
tion of Hull's inhibition theory to 
WU. In the first session the subject 
responds continuously and Ir ac- 
crues. Then, over rest, Ip dissipates 
and an increment of sJp develops. 
The failure of performance on the 
first postrest trial to reminisce to the 
level of the distributed group is taken 
as evidence for the presence of sJp. 


ADAMS 


When the subject begins practice in 
the second session the goal response 
is now being reinforced and the non- 
reinforced resting response under- 
goes experimental extinction. This 
period of extinction of the resting re- | 
sponse is revealed as the WU seg- 
ment, according to Eysenck. One im- 
mediate prediction from Eysenck’s — 
hypothesis is that little WU should 
be found under well-spaced practice 
conditions where a negligible amount 
of Ip is generated on each trial, and 
thus negligible sZz. Eysenck tested 
this deduction using the Rotar\Pur- 
suit Test and, in accord with his pre- 
diction, found WU under conditions | 
of massed practice but not distrib- 
uted practice. His findings and con- 
clusions are tenuous however, be- 
cause of the instances in the experi- 
mental literature showing WU under — 
conditions of widely distributed prac- 
tice on the Rotary Pursuit Test. Fig- 
ure 1 is a good example (Adams, 
1952b). Other examples are Am- 
mons (1950), Denny, Frisbey, and 
Weaver (1955), Digman (1959), 
Kimble and Shatel (1952), and 
Jahnke and Duncan (1956). There is — 
no immediate explanation for Ey- 
senck’s unusual finding, but WU un- 
der conditions of well-distributed 
practice is a commonplace finding 
and suggests that Eysenck’s hy- 
pothesis cannot be taken seriously. 
Adams (1952b) entertained a dif- 
ferent inhibition hypothesis. Ob- 
serving that much of the evidence for 
WU came from studies on the Rotary 
Pursuit Test under conditions of 
massed practice, he deduced the 
characteristics of a postrest perform- 
ance curve from a negatively acceler- 
ated growth of reaction potential 
with trials and an ogival function for 
the accrual of work inhibition when 
practice is massed. It was predicted 
that WU should not appear under 
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conditions of distributed practice. 
The occurrence of clearcut WU when 
training was widely distributed (Fig- 
ure 1) led to rejection of the hypothe- 
Sis. 

At present it must be concluded 
that no convincing evidence exists in 
support of an inhibition explanation 
of WU. 


Warm-Up IN ANIMALS 


The main concern over WU has 
been with human subjects, but it is 
noteworthy that the phenomenon 
also has been observed in animals. 
The following studies are not meant 
to represent an exhaustive search of 
the literature on animal behavior, 
but rather are intended to show the 
ubiquity of WU-like effects and that 
its characteristics are not found only 
in human response records. Schlos- 
berg (1934, 1936) interprets as WU 
the failure of occurrence of a well- 
learned conditioned response in the 
white rat on the first few trials of a 
learning session. Ellson (1938), in 
studying extinction of a bar-pressing 
habit in the rat, found rate of re- 
sponse slower in the first fifth of the 
extinction trials than in the second 
fifth, He interpreted this as WU and 
explained it in terms of Guthrie's 
theory which holds that the stimuli 
to which a response is learned include 
internal stimuli resulting from pos- 
ture, movement, etc. Later responses 
are partly conditioned to the re- 
sponse-produced stimuli of earlier re- 
sponses and we would expect that 
later responses in a series would have 
greater strength because of the pres- 
ence of the response-produced stimuli 
to which they are conditioned. In the 
latter part of extinction the effects of 
nonreward overcome this trend and 
the performance level then decreases 
systematically. Finger (1942) used 
rats in a straight runway situation 
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and found WU revealed when an ex- 
tinction series was administered after 
24 hours. The second extinction trial 
actually had performance superior to 
that of the first extinction trial—a 
finding contrary to expectation for an 
extinction series. Finger’s finding for 
extinction is quite similar to Ellson’s. 
Verplanck (1942) reported WU for 
rats in a simple running task. Like 
many investigators of human be- 
havior, these animal researchers 
freely labeled decrements in initial 
postrest segments of performance 
records as WU although the decre- 
ments could just as well, and more 
economically, have been explained by 
the one-factor forgetting hypothesis. 


Discussions AND CONCLUSIONS 


Virtually all support for the two- 
factor theory of forgetting is em- 
bodied in experiments which have 
demonstrated that WU is reduced or 
eliminated by repetition of responses 
that orient the subject to the general 
task demands (¢.g-, color-naming) 
but which do not involve direct prac- 
tice of goal responses. By themselves 
these experiments might be sufficient 
to establish set as a second factor nec- 
essary for the explanation of reten- 
tion loss, but the studies where set- 
reinstating activities have failed to 
influence recall in both motor and 
verbal tasks, and the tasks where no 
WU whatsoever has been found, 
leave the second factor in doubt. 
There is not sufficient evidence to re- 
ject the set hypothesis but neither 
are there grounds for firmly retaining 
it. Certainly it is the most tenable of 
all hypotheses advanced, but a great 
deal of careful research seems re- 
quired before the set hypothesis, and 
thus the two-factor theory of forget- 
ting, can be accepted or rejected with 


confidence. ve 
It is unlikely that a decision ever 
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will be made about the set hypothe- 
sis unless it receives a more thorough 
testing than it has in the past. The 
set-reinstating experiments are very 
broadly derived from the hypothesis 
and have not been a test of the more 
explicit implications of set. By view- 
ing the set hypothesis in its more de- 
tailed aspects, and in attempting to 
develop specific experiments and 
measures to empirically verify these 
details, it should be possible not only 
to clarify the status of the set hy- 
pothesis but also to determine why, 
for example, some tasks display 
WU and others do not. For example, 
Irion (1948), Ammons (1947a), and 
Bell (1942) all contend that the 
acquisition of set can be the learning 
of beneficial postures and muscular 
tensions that facilitate the occurrence 
of goal response sequences. Rest pe- 
riod activities disturb the favorable 
set and the WU segment of a post- 
rest performance curve represents the 
reacquisition of these favorable bod- 
ily attitudes. If there is anything to 
this version of the set hypothesis, it 
would seem fruitful to explore the 
characteristics of bodily tensions by 
direct measurement and then relate 
it to changes in performance of the 
goal response. Davis and his associ- 
ates have performed a number of 
studies (eg., Davis, 1940, 1956) 
showing the relationship between the 
characteristics of overt responding 
and muscular tensions as revealed 
by _electromyographic measurement 
techniques. Davis (1956) does not 
believe that the muscular substrata 
and the overt goal responses need be 
conceptualized as fundamentally dif- 
ferent. A state of tension in skeletal 
muscle is the same as any other 
muscular contraction, i.e., it is a re- 
sponse configuration. Davis (1956) 
says: 


Muscular tensions would then be themselyes 
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responses to stimuli, many being small re 
sponses, detectable only with instruments, but 
with no firm boundary between them and the 
larger muscular activities associated with 
movement (p. 2). 


Davis’ work is suggestive for the 
set hypothesis because it strongly 
hints that the pattern of electromyo- 
graphic measures of muscular tension 
during the WU segment of the post- 
rest performance curve would have 
levels and patterns of muscular ten- 
sion different from final prerest per- 
formance, and these levels and pat- 
terns will have shifted in the direction 
associated with poorer performance. 
Moreover, the reacquisition of prerest 
values and patterns of muscular ten- 
sions should parallel the trend of the 
WU segment. Furthermore, and im- 
portantly, it suggests that neutral 
set-reinstating activities will pro- 
duce electromyographic changes sig- 
nifying that the favorable muscular 
tensions existing at final prerest per- 
formance are being re-established. 

There are difficulties in opera- 
tionally distinguishing between an 
electromyographically-verified muscle 
tension version of the set hypothesis 
and Irion’s alternative Guthrian hy- 
pothesis that loss of set is disturbance 
of internal stimuli to which goal re- 
sponses are partly conditioned. If 
we have changes in the muscle ten- 
sion secondary responses and this in 
turn, results in a lower level for goal 
responses, we cannot be sure that the 
lower level is due to quasi-mechanical 
considerations where muscular ten- 
sions underlie useful postures and 
bodily attitudes, or whether it is due 
to changes in the population of stim- 
uli to which the goal responses have 
been conditioned. Despite the po- 
tential difficulties of interpreting the 
primary effects of muscle tension sec- 
ondary responses, on performance of 
the goal responses, it would be a fun- 


Amental finding to show a syste- 
latic covariation of electromyo- 
aphic measures and WU phenom- 
na. The evanescent quality of set 
rould benefit from a diversity of ap- 
proaches at this time to provide 
“clues for a reconciliation of inconsist- 
encies among the various experi- 
mental findings. 

The delineation of set and its role 
in retention will sharpen our under- 
standing of the retention loss problem 
and will improve our efforts to predict 
and control it. Underwood (1957) 
has shown that our frequent use of 
the same subjects in several labora- 
| tory experiments has led us to greatly 
__ overestimate the retention loss for 
verbal responses because the experi- 
~ menter was unwittingly contaminat- 
= ing his retention scores with proac- 
tive inhibition effects. But even 
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showing the proportion of it attribut- 
able to interference with goal re- 
sponses and the part assignable to 
loss of set.  Irion’s experiment 
(1949b), for example, showed that 
one trial of color-naming almost com- 
pletely eliminated the verbal reten- 
tion loss and therefore all of the loss 
could be described in terms of change 
in set. This suggests that if the two- 
factor theory eventually becomes 
better established in fact the para- 
digm of retention studies will have 
to include groups whose performance 
of set-reinstating activities will allow 
a parsing of set and interference com- 
ponents. Interference with goal re- 
sponses may be a smaller contributor 
to retention loss than we now sur- 
mise. The first research need how- 
ever, is a more incisive laboratory at- 
tack on the validity of set and its — 


ven this downward revision of re- underlying nature. , 
tention loss, we are still faced with ha 
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Among the least satisfactory ele- 
ments of Hull's behavior system is 
his formulation of inhibition. As a 
result, there have been several at- 
tempts in recent years to reformu- 
late Hull's theory with respect to the 
inhibition variables in the equation 
for effective reaction potential (sEp). 
The present paper critically examines 
these reformulations in the light of 
relevant experimental evidence. The 
conclusions to which this examina- 
tion leads are that these reformula- 
tions have not been an improvement 
over Hull and that this kind of re- 
formulation itself is a futile approach 
to the problem of improving Hullian- 
type learning theory. 

In all versions of his theory Hull 
(1943, 1951, 1952) formulated “ef- 
fective reaction potential” (sE,p) as 
being essentially a function of “drive” 
(D) and “habit strength” (sHr), re- 
lated multiplicatively (i.e., DXsHz), 
minus “reactive inhibition” (Ir) and 
“conditioned inhibition” (sIr), re- 
lated additively (i.e., In+sIn). Thus: 


sEr=(DXsHr)—(Ir+ sIr) 


Most of the attempts to reformu- 
late Hull's equation have been the 
result of logical, or at times merely 
verbal, rather than empirical con- 
siderations. For example, Hilgard’s 
(1956, p. 139) criticism is directed at 
the fact that Hull did not carry out 
the logical implications of his state- 
ment that Ir is a “negative drive 
state.” As such, Iz logically should 
subtract from D (i.e., D—TIp) and, 
like D, should interact multiplica- 
tively with habit strength (ie., 
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IrXsHr). Hilgard also suggests 
that, since s/z is a negative habit, it 
should interact multiplicatively with 
Ir. Thus, Hilgard’s proposed refor- 
mulation of the equation for net re- 
action potential results in the follow- 
ing: i 
sEr=[(D—Ir)XsHr]— (IX. E) 


This new formulation seems to be 
more consistent with some of Hull's” 
own statements about the nature of — 
these intervening variables, but Hil- ly 
gard avoids trouble by not attempt- 
ing to relate this formulation to — 
empirical findings. 

Similarly, Iwahara (1957) carries — 
Hull’s characterization of Ip as a 
negative drive and sIr as a negative — 
habit to what may seem the logical ~ 
conclusion in terms of the internal — 
consistency of Hull’s theory—that — 
the relationship between drives and 
habits is always multiplicative and — 
never additive. Iwahara then goes a _ 
step further to regard sZp as a con- | 
ditioned or secondary negative drive, 
with Iz being the primary negative _ 
drive. From this it follows that the ` 
product of IzXsIp should subtract 
from positive drive, D, and should — 
also multiply sHp. Symbolically, 


sEr=sHrX[D-(IrXsIn)] 
or, in expanded form, 
sĒr= (sHRX D)—(sHRXIRX sIn) 


Osgood (1953, p. 379) states that — 
Hull need not have postulated sIe _ 
at all, since it might have been de- 
rived from other postulates in the 
system. If sIr is nothing other than — 
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negative habit strength or the habit 
of not responding (reinforced by the 
dissipation of Ir), it would seem 
logical to subtract sr directly from 
sll. This is the formulation Osgood 
has proposed (p. 349). 

More recently, Jones (1958) has 
incorporated the foregoing sugges- 
tions in his revision of Hull’s equa- 
tion. The Jones version, which com- 
bines the properties of the other re- 
visions (except Iwahara’s sĦHrX sIr) 
and appears identical to Osgood’s 
suggestion, is as follows: 


sEr=(D—Ir)X(sHr— sIr) 


That this formulation is quite rad- 
ically different from Hull’s is even 
more obvious when Jones mathe- 
matically expands the equation, thus: 


sEx=(DXsHrz)—(IrXsHp) 
—(DXsIr)+(IrXsIĪr) 


Jones’ formulation has been sub- 
scribed to by Eysenck and his co- 
workers in their attempt to utilize 
Hullian postulates in developing a 
theory of personality (Eysenck, 1957; 
Kendrick, 1958). 

Another revision, rather casually 
suggested by Woodworth and Schlos- 
berg (1954, p. 668), is that inhibition 
(Ir or sIr or both?) should subtract 
from “incentive motivation” (Hull’s 
K, a function of the amount of rein- 
forcement). Presumably the total 
inhibitory potential Iz (the sum of 
Ir+ sIn) subtracts from K, though 
this point is not clear in the Wood- 
worth and Schlosberg discussion. 
Their suggestion might be expressed 
symbolically as follows: 


sEr=(K—Ipn—sIn)XDXsHr 


The most carefully formulated and 
€mpirically anchored modifications 
of Hull’s theory have been those of 
Spence (1956). His changes in the 
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inhibition part of the theory are of a 
fundamentally different nature than 
the other revisions. He has more or 
less wiped the slate clean and started 
anew by redefining inhibition and the 
independent variables of which it is 
a function. Spence’s extinctive in- 
hibition (J,) is not a function of the 
amount of effort or rate of respond- 
ing, as is Hull’s Zp, but is a function 
only of the number of nonreinforced 
responses. There is also an oscillatory 
inhibition (J.), which is the same as 
Hull’s concept of oscillation (sOx). 
The inhibition due to delay of re- 
ward (J,) is essentially the same as 
I,. The basis of this inhibition is as- 
sumed to be the competing responses 
that are established during the de- 
lay period or during extinction. The 
molar concepts of J; or J, simply 
represent the quantitative effects of 
these competing responses. Spence’s 
inhibition does not interact with 
other intervening variables but only 
subtracts from the reaction potential. 
In this last respect his formulation is 
essentially no different from Hull's. 
It might be asked why D, if it is re- 
garded as an energizer of all re 
sponses in the organism's repertoire, 
should mot interact with inhibition as 
Spence conceives of it, that is, as 
consisting of interfering or compet- 
ing responses. In this respect 
Spence’s theory of extinction is not 
unlike Guthrie's. 

With the exception of Spence, 
these attempts to reformulate Hull 
raise a number of crucial questions 
in common, some of which must be 
critically examined on the level of 
theory and methodology and others 
in terms of empirical evidence. First 
there are questions of a general 
theoretical nature which must be con- 
sidered in relation to any attempt to 
criticize or reformulate Hull's theory. 

1. Is the verbal formulation of 


276 


Hull's theory to be taken more 
seriously than the symbolic and 
quasi-quantitative formulations, or 
than the actual empirical relation- 
ships which formed the basis for 
Hull's postulates and which he has 
held up as examples of the relation- 
ships he wished his system to predict? 

2. Does the algebraic manipulation 
of Hull's intervening variables make 
sense theoretically and psycho- 
logically? Are the functions repre- 
senting their interrelationships “‘iso- 
morphic” with the rules of simple 
algebra? 

3. Can experiments be designed to 
determine the exact nature of the in- 
tervening variables? 

Once one has decided to argue 
within the Hullian framework a num- 
ber of questions arise from the at- 
tempts at reformulation, the an- 
swers to which must depend upon 
empirical findings. 

1. Does slg subtract from sHp? 
Are sHp and sIr both basically the 
same phenomenon, one merely being 
positive and the other negative in 
effect, or do they represent basically 
different processes? 

2. Is there any empirical evidence 
to support the following formula- 
tions? 

a. The interaction of DXsIp 
(Jones, Osgood) 

b. D—Ir (Hilgard, Jones, Os- 
good) 

c. The interaction of sHrX Ir 
(Hilgard, Iwahara, Jones, Osgood) 

d. The interaction of sHrX sIr 
(Iwahara) 

e. The interaction of IrX sIr, 
which paradoxically represents an 
addition to reaction potential, the 
multiplication of two negative quan- 
tities making a positive (Hilgard, 
Iwahara, Jones, Osgood) 

f. K—Ir (Woodworth & Schlos- 
berg) 
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Tue Liwrrations or HuLL's TH 

In offering his revision, Jones 
(1958) points out that the inhibition” 
aspect of Hull's formula for reacti 
potential has been criticized by Koch 
(1954). Koch’s criticisms, however, 
apply equally to Jones’ revision as 
well as to all the others, with the 
possible exception of Spence. Koch 
points out that the intervening vari- 
ables concerning inhibition in Hull's 
system, particularly s/e, are not 
rigorously defined, are not clearly 
tied to experimental variables, and 
hence are indeterminate. Because of 
this, it is impossible to make ri, rous 
experimental tests of Hull’s for. ula- 
tions or of the alternative revisions. 
Cotton (1955) has shown that a 
literal interpretation of Hull's postu- 
lates leads to predictions that differ 
from the experimental data upon 
which Hull based the formulation of 
his postulates in the first place. In 
short, much of Hull's theory does not 
even predict the very facts it was ex- 
pressly devised to predict. This is 
especially true with regard to the in- — 
hibition postulates. None of the re- 
visions of Hull has improved this 
situation. They have merely rear- — 
ranged in various ways the same in- 
determinate variables of Hull’s for- 
mula for sEp. 

Hull's revisers have followed him 
in treating his intervening variables, 
D, sHpr, Ir, sIr, etc., as if they were 
real, independent quantities whose 
laws of interaction are isomorphic 
with the rules of arithmetic and 
algebra. As we shall see, the manipu- 
lation of these hypothetical variables 
in such fashion can at times lead to 
absurdity. Hull’s intervening vari- 
ables are only intervening variables in 
the sense which MacCorquodale and 
Meehl (1948) have assigned to that 
term, and are defined only in terms 
of the independent and dependent 
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variables to which they are tied. The 
danger arises when Hull's revisers 
mathematically manipulate the in- 
tervening variables without regard 
for the defining experimental vari- 
ables which are actually all that give 
any meaning to the intervening vari- 
ables. Of course, one of the pur- 
ported virtues of intervening vari- 
ables is that they can be mathe- 
matically manipulated as independ- 
ent entities. But once the interven- 
ing variable has been properly de- 
fined, the question arises as to the 
nature of the mathematical opera- 
tions that can suitably be applied to 
it. It is highly doubtful if the exclu- 
sive use of linear algebra by Hull and 
his revisers is at all suitable. It 
should be noted that in Hull’s own 
statements (1943) the relationship 
between experimental variables and 
intervening variables is usually any- 
thing but linear. If the exact form of 
the functional relationship is not 
known, performing linear algebraic 
operations on the intervening vari- 
ables is practically meaningless. Un- 
der these conditions, for example, one 
cannot prove on the basis of experi- 
mental data whether changes in re- 
sponse strength are the result of an 
additive or a multiplicative rela- 
tionship between intervening vari- 
ables. From more fundamental con- 
siderations, Hilgard (1958) points 
out that Hull’s intervening variables 
cannot in their present form be mul- 
tiplied meaningfully, since they are 
not in comparable units of measure- 
ment. Certainly the least objection- 
able formula for reaction potential 
isalso the least specific. Consequent- 
ly it has the least predictive power: 


sEr={(D, K, sHpr, Ir, etc.) 
In view of the facts here noted, great 


difficulties arise when Hull and his 
revisers become more explicit about 
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the nature of the relationships be- 
tween these variables. 

Though it would not be in keeping 
with the spirit of Hull's formal 
theorizing, some of the problems 
might be avoided if Hull's formula 
for sĒr were regarded, not as a true 
mathematical equation, but merely 
as a kind of shorthand for expressing 
certain relationships suggested by 
empirical findings. The arithmetic 
signs of addition, subtraction, and 
multiplication in the formula would 
then not be taken too literally. Thus, 
E=H-—I would not be taken to 
mean that inhibition subtracts from 
habit and that when Æ finally equals 
zero, the habit has been removed and 
the organism restored to the same 
state as before the habit had been 
acquired. The equation merely 
states in shorthand form that reac- 
tion potential, as inferred from some 
measure of response strength, de- 
creases as the experimental proce- 
dures said to increase habit strength 
are removed and the conditions said 
to produce inhibition are applied. 
The subtraction sign is used here, not 
in a strict mathematical sense, but 
only asa shorthand expression for an 
experimental manipulation. Whether 
Hull has chosen to add or to multiply 
various intervening variables most 
likely has been a result of his attempt 
primarily to represent known em- 
pirical relationships rather than to 
maintain logical consistency within 
his theory. He most likely formu- 
lated DXsHre, for example, because 
he believed this interaction of habit 
and drive represented the experi- 
mental evidence. And most probably 
the reason he did not formulate 
DXsIn, even though his theory 
seems to call for this logically, was 
simply because he found no evidence 
that suggests an interaction between 
drive and inhibition. 
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From the foregoing considerations, 
probably the ultimate conclusion to 
which we are forced regarding the 
attempted revisions of Hull's theory 
is not so much that these revisions 
are no improvement over Hull, but 
that it is futile to attempt to improve 
upon Hull by mere juggling of his in- 
tervening variables. Hullian theory 
will not be improved by continuing 
to work with the concepts of drive, 
habit, inhibition, etc. in exactly the 
same form they were given by Hull. 
The very building blocks of the 
theory, so to speak, are inadequate, 
and no amount of recombining them 
in new ways is likely to result in any 


substantial advance in learning 

theory. 

REFORMULATIONS AND EMPIRICAL 
EVIDENCE 

sHa-—sle 


While Hull (1943) refers to sIr as 
a “negative habit,” there is no in- 
dication in his writing that he re- 
gards sJz as merely negative sHp. 
The revisions suggested by Osgood 
and by Jones are based on the as- 
sumption that sHg and slp are 
basically the same phenomenon, sIr 
merely being the negative counter- 
part of sHr. Thus, if they are the 
same process but merely opposite in 
effect, it seems logical that one should 
subtract from the other. Similarly, 
if sHe interacts with drive, so should 
sTz. Hull, however, quite clearly 
did not regard sHp and slp as 
basically one and the same phenom- 
enon, and his reasons are based on 
experimental evidence that reveals 
differences between the two. Pavlov 
(1927) originally pointed out the 
greater susceptibility of internal in- 
hibition (of which sI} is one variety) 
to external inhibition (i.e., disinhibi- 
tion) than is the case with the ex- 
citatory process corresponding to 
Hull’s sHr. That sIr is more labile 
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and sensitive to external influences 
than is sHx suggests that it is not 
merely the negative counterpart of 
the same phenomenon. Therefore, 
Hull is consistent with Pavlov in not 
subtracting sIr directly from spr. 
Another line of evidence that ex- 
citation (conditioning) and inhibi- 
tion (extinction) are basically dif- 
ferent processes is well demons’: ated 
in a series of experiments by | eyn- 
olds (1945a, 1945b), which s} -wed 
that acquisition of a conditione.. re- 
sponse is slower for massed than for 
distributed trials, while the reverse 
relationship holds for extinction. 
Also a number of studies (Hilgard & 
Marquis, 1940, p. 119) have shown a 
negative correlation between the speed 
of conditioning and of extinction. 
The issue of whether the gen- 
eralization gradients of excitation 
(conditioning) and inhibition (ex- 
tinction) are the same or different 
was left undecided by Hull (1943, p. 
265). The Bass and Hull (1934) and 
Hovland (1937) studies referred to by 
Hull were not adequate to answer 
this question. Not finding evidence 
to the contrary, Hull merely as- 
sumed that the generalization grad- 
ients of excitation and inhibition were 
the same, which is a convenient as- 
sumption in his theory of simple dis- 
crimination learning (1943, p. 267) 
based on the interaction of the 
gradients of excitation and inhibi- 
tion. On this point, however, there 
is now some tentative evidence that 
seems to contradict Hull’s assump- 
tion. Liberman (1951) found that 
extinction (sZz)! has broader transfer 


1 In Hull’s system, though the entire proc- 
ess of extinction is not explained in terms of 
only sIr, but includes reactive inhibition 
(Ir) as well, once extinction is complete, or 
after enough time (probably 5 to 10 minutes) 
has elapsed for the dissipation of Ip, extinc- 
tion is conceived of as solely a function of the 
relative magnitudes of the positive reaction 
potential and sJz. 
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effects than acquisition (sĦr). Also 
there is some evidence (Razran, 1938) 
that the stimulus generalization of ex- 
tinction (s7) differs from that of ex- 
citation (sr), in that extinction 
shows greater stimulus generaliza- 


tion; the gradient of its generaliza- 
tion contains fewer steps; the stimu- 
lus generalization of extinction, un- 


like that of acquisition, does not ex- 
tend to heterogeneous CRs; and 
generalization of extinction is more 
affected by drugs than is generaliza- 
tion of conditioning. 

The formulation sMHr— sIr seems 
misleading in view of the fact that 
successive periods of acquisition and 
extinction become more rapid and 
that an organism in which an ac- 
quired response has been extin- 
guished is not the same as an or- 
ganism that had never acquired the 
response. Razran (1956) has pointed 
out that in a partially extinguished 
CR there can be shown the co- 
existence of two opposing processes, 
Positive and negative. “Even a 
wholly extinguished CR bears, by all 
signs, within itself a two-way CR con- 
nection” (p. 42). Successive acquisi- 
tion and extinction may be conceived 
of as a kind of discrimination learn- 
ing, in which both sHr and sIr grow 
simultaneously, neither one diminish- 
ing the other. The cessation of rein: 
forcement becomes a cue, a condi 
tioned inhibitor, the strength of 
which increases throughout succes- 
sive extinction periods (Bullock & 
Smith, 1953; Perkins & Cacioppo, 
1950). This kind of discrimination 
learning is likely to be a very primi- 
tive kind of discrimination not in- 
volving symbolic or mediating proc- 
esses. Tentative evidence for this 
Opinion is found in the experiments 
on spinal conditioning, which, how- 
ever, are not yet entirely beyond dis- 
pute as examples of true condition- 
ing. Nevertheless, for what it is 


worth, Shurrager and Shurrager 
(1946) have reported that both con- 
ditioning and extinction, measured at 
a single synapse in a spinal prepara- 
tion, become faster with successive 
periods of conditioning and extine- 
tion. 

Hull (1952, p. 114) also pointed 
out that the delay CR (the “inhibi- 
tion of delay” being due to s/a) is 
eliminated by certain drugs, for exe 
ample, caffeine and benzedrine. It is 
hard to see why the CR itself would 
not be markedly weakened or elimi- 
nated altogether if these drugs af- 
fected both sHx and sxin the same 
manner. The CR is strengthened, 
however, while the period of delay is 
markedly shortened. Certain drugs 
thus seem to have opposite effects on 
sig and sIr, suggesting again that 
they represent essentially different 
underlying physiological processes. 
Skinner's (1938, pp. 412-413) finding 
that benzedrine and caffeine increase 
the number of responses to a criterion 
of extinction lends plausibility to the 
idea that these drugs have different 
effects on sHx and sIr- If sHpr and 
se were the same process, then a 
drug increasing sHe would also in- 
crease the inhibitory effect of each 
nonreinforced response. If this were 
the case, the unfailing effect of 
stimulant drugs in increasing the 
number of responses to extinction 
could not easily be accounted for. 
The evidence bearing on this subject, 
however, is not crucial, in that we do 
not have evidence regarding the per- 
centage increase in responding during 
extinction under benzedrine over the 
operant level (preconditioning Te- 
sponse rate) under benzedrine. Also 
it should be noted that the theoretical 
problem hinges to some extent upon 
the hypothesized relationship be- 
tween excitation (or sr) and inhibi- 
tion (sIr); that is, whether it is 
the absolute difference between the 
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two that matters or the ratio (or 
“balance”) between excitation and 
inhibition. In the Pavlovian system 
it is the balance or ratio of excitation 
to inhibition that determines reac- 
tion potential. In Hull’s system it is 
the absolute difference between sHp 
(and the variables interacting with 
it) and Jr. A strictly Pavlovian re- 
vision of Hull might take the follow- 
ing form: 


DXsH 
sEr=log Plaka 


Ip 


Thus it is the balance between exci- 
tatory and inhibitory processes that 
is emphasized and not the absolute 
difference. In this equation, when 
the total inhibitory potential (Lp) is 
equal in strength to DXsHp, the 
ratio of D XsHpr/Ir becomes 1.0, and 
since log 1.0=0, the effective reac- 
tion potential (sr) will equal zero. 

The fact that Eysenck and his co- 
workers have subscribed to the Jones 
revision would seem incompatible 
with Eysenck’s (1956) theory con- 
cerning the extinction of sIp. The 
extinction of sIp is paradoxical and 
inconsistent with other aspects of 
Hull’s theory and also of Jones’ re- 
vision. If, as maintained by Jones 
and by Eysenck, sIr is merely nega- 
tive sHr, then the mere lack of rein- 
forcement of sIr (reinforcement be- 
ing the dissipation or avoidance of 
Ir) should not result in a decrease in 
sIr. Lack of reinforcement does not 
diminish the sHp already present, so 
it should not diminish sZ,p either. 
The notion that extinction is an ac- 
tive process of an increasing inhibi- 
tion (Iz) depressing performance 
(sER) is basic in Hull’s system. It, 
therefore, seems absurd, while re- 
naining in the Hullian framework, 
o speak of the extinction of inhibi- 
ion without first postulating a sec- 
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ond inhibitory process which de- 
presses the first. Fortunately, there 
is no experimental evidence at pres- 
ent to suggest that such a complica- 
tion would be necessary. 


DXsIr 
In Hull’s theory there is no inter- 
action between drive and condi- 


tioned inhibition. The DX sIr inter- 
action, however, is explicit in a 
number of the revisions. Since s/p is 
the primary and essential interve ing 
variable accounting for experimental 
extinction, we may well examine the 
different predictions generated by 
Hull and the revisions with respect 
to the DX sIr interaction. 
According to Hull, since D multi- 
plies only sĦpg and not sIr, we should 
predict that certain measures of ex- 
tinction will be affected by changes 
in D. With the Hullian formula 
DXsHr—sIĪr, one can predict that 
under a high drive level there will 
be a greater number of responses to 
extinction (z) than under low drive. 
The same increment of slp is gen- 
erated by each response during ex- 
tinction, regardless of the level of D, 
while the positive reaction potential 
(DXsHpr) is increased by a higher 
level of D. Not only does it follow 
from Hull’s formula that a greater 
number of responses is required for 
extinction, but extinction curves 
under high and low D should be 
parallel. They approach the cri- 
terion of extinction with the same 
slope, but reach it at different points. 
The revisions containing the 
DXsIĪr interaction generate pre- 
dictions that are exactly opposite to 
the foregoing. If net reaction po- 
tential is a resultant of DXsHnr 
—DXsIr, then every increment of 
sIr will be increased by D to the same 
degree that sHp has been increased. 
Consequently, there should be the 
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Fic. 1. The relationshi i i 
i a i ps between drive (D), number of trials to extinction (n), and effec- 
tive reaction potential (sg) as predicted by Hull's formulation (left) and by Jones's formula- 


tion (right). 


same number of responses to extinc- 
tion under high drive as under low 
drive. Also, the slopes of the extinc- 
tion curves, as measured by, say, 
rate of responding, would be different 
under high and low drive. In other 
words, the curves would approach 
the criterion of extinction with dif- 
ferent slopes, but would reach it at 
the same point. 

If the proponents of the DX sIr 
formulation object to the foregoing 
predictions on the grounds that Ir 
has not been taken into account, let 
it be pointed out that sIr is essential 
for complete extinction of the re- 
sponse and that extinction can take 
place with sufficiently spaced trials 
to prevent the growth of Ir. If, as 
Hull hypothesized (1943, pp. 300- 
301), the formation of sIr is depend- 
ent upon nonresponding being co- 
incident with the dissipation of Ir, 
auec could not take place if all 
= had dissipated in the interval be- 
ee each presentation of the non- 
i orced CS. Yet extinction is 
: 10wn to occur even with long inter- 
rial intervals of 24 hours or more, 
pee Ir should supposedly have 
“@ completely dissipated (Razran, 

6, p. 43). This, along with the 


fact that in all of the revisions an 
increment of Ip will reduce sEx by 
the same proportion regardless of the 
level of D, makes Iz irrelevant to the 
present argument. (The D—TIp for- 
mulation is discussed at a later 
point.) 

There is a considerable amount of 
experimental evidence bearing on the 
above predictions. The preponder- 
ance of evidence favors the Hullian 
formula and fails to support the no- 
tion of a DXsJe interaction. Perin 
(1942), working with rats, found a 
marked positive relationship between 
D (degree of hunger) at the time of 
extinction and the number of re- 
sponses required for extinction. 
Brandauer (1953) extinguished bar 
pressing in rats under three levels of 
drive (thirst) and found a positive 
relationship between strength of drive 
and number of responses during ex- 
tinction. Even under minimal dif- 
ferences in hunger drive (.5, 1, 2 
hours’ deprivation) Saltzman and 
Koch (1948) found highly significant 
differences in number of responses to 
extinction in a modified Skinner box. 
Brown (1956) also found that rats on 
high drive make more responses dur- 
ing extinction than those on low 
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drive. Cautela (1956) showed es- 
sentially the same relationship for the 
extinction of a discrimination re- 
sponse. However, he found a slight 
decrease in 7 for levels of D beyond 
23 hours’ deprivation. He attributed 
this phenomenon to the generaliza- 
tion gradient of the drive stimuli; 
under the highest levels of D, the 
drive stimuli were further out on the 
generalization gradient from the 
drive conditions under which the 
original learning had occurred. The 
energizing and stimulus properties of 
drive are thus apt to interact in this 
type of experiment. 

In experiments with human sub- 
jects, where anxiety has been used as 
a measure of drive, a similar rela- 
tionship with extinction has been 
found. In one study, high anxiety 
subjects required almost twice the 
number of trials to extinguish the 
conditioned eyeblink as did low anx- 
iety subjects (Spence & Farber, 
1953). Bitterman and Holtzman 
(1952) obtained similar results in ex- 
tinguishing the PGR in high and low 
anxiety subjects. 

Skinner’s (1938) early notion of 
the “reflex reserve” appears to be 
consistent with the DX sIr formula- 
tion. Skinner believed that the num- 
ber of responses emitted during ex- 
tinction was solely a function of the 
number of previously reinforced re- 
sponses and the schedule of rein- 
forcement. Thus drive should not 
affect n, but would affect only the 
rate of emission of responses. The re- 
flex reserve concept, however, has 
long since been found unfruitful. 
While theoretically it is probably not 
a strictly testable hypothesis, it now 
at least appears quite incorrect in 
view of the evidence (Ellson, 1939). 

Skinner’s (1938) original belief that 
rate of responding, but not the num- 
ber of responses in extinction (n), 


ARTHUR R. JENSEN 


is affected by drive is contradicted 
by Bullock’s (1950) investigation 
showing a correlation of .61 between 
rate and n. This positive correlation 
between response rate and number 
of responses to extinction would cer- 
tainly seem inconsistent with a 
DXsIr formulation. If drive in- 
creases response rate, sZr should in- 
crease faster under higher drive, each 
response adding the increment 
DXsIn, thus leading to more rapid 
extinction. The evidence is exactly 
the contrary. Higher drive not only 
increases the rate of response, but 
also increases the total number of re- 
sponses to a criterion of extinction. 
The best available evidence also 
indicates that the slope of the extinc- 
tion curve is the same under high 
and low drive, as would be predicted 
from Hull’s theory. Sackett (1939) 
showed that when the extinction 
curves of two groups of rats, one 
group extinguished under 6 hours’ 
hunger drive and the other under 
30 hours’ drive, are Vincentized, the 
forms of the two curves are almost 
identical. The 30-hour group pro- 
duced more responses to extinction 
and required more time to extin- 
guish, but the slope of the extinction 
curve was the same as that of the 6- 
hour group. Barry (1958) trained 
rats in a running response and ex- 
tinguished them under high and low 
drive. The extinction curves were 
parallel, and when drive was equal- 
ized in both groups late in extinction, 
the curves converged and were iden- 
tical after three trials. When drive 
was equal for both groups early in ex- 
tinction, and then, later in extinc- 
tion, the groups were run under high 
and low drive, the extinction curves 
diverged, and, after three trials, con- 
tinued almost parallel, as would be 
predicted from Hull. (The fact that 
it took three trials, rather than one, 
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for the curves to converge or diverge 
after the change in D, however, is 
somewhat embarrassing to Hull's 
theory as it is also to the revision.) 
Both these findings are consistent 
with the DXsHr—sIr formulation 
and not with DXsHr—DX sIr. But 
these experiments cannot be re- 
garded as at all definitive in view of 
the finding of Reynolds, Marx, and 
Henderson (1952) of an interaction 
between D and the incentive factor 
K (a function of amount of reward). 
This interaction plays havoc with 
any theoretical conclusions drawn 
from experiments on the effects of 
drive on extinction in which the in- 
centive factor has not been taken into 
account. Reynolds et al. (1952) had 
four groups of rats learn bar pressing 
under all combinations of high drive 
-low drive and large reward-small 
reward. All animals were given ex- 
tinction trials under equal drive. It 
was found that 


in those learning situations where a relatively 
large amount of reward is employed for rein- 
forcement, high D animals extinguish more 
readily than low D animals; and... where a 
relatively small reward is given per reinforce- 
ment, low D animals extinguish more readily 
than high D animals (pp. 41-42). 


Hull’s theory and its revisions gen- 
erate conflicting predictions regard- 
ing spontaneous recovery. In the 
Jones (1958) formula, sEr=D-— Ir) 
X (sHr— sIr), spontaneous recovery 
could occur only if at the end of the 
first set of extinction trials D — Ir =0. 
But this formulation would lead to 
problems, since, if D—Ir=0, no 
habits at all could be activated tem- 
Porarily until some of the Zr had 
dissipated, and no behavior of any 
kind would occur after the end of the 
first extinction period. We know 
very well, however, that animals go 
on behaving in various ways im- 
Mediately following the extinction of 
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a particular response. But then if 
we do not wish to assume that D— Ig 
is equal to zero immediately after the 
first extinction period, we must as- 
sume that sHr— sIr equals zero, or 
extinction would not have occurred. 
Yet if sHr—sIx were zero, there 
could be no spontaneous recovery. 
Conceivably one way out of this 
dilemma for the Jones revision is to 
make some assumptions about a re- 
action threshold which must be ex- 
ceeded before an overt response is 
made. Thus, overt extinction could 
occur before either D—Ir=0 or 
sHr—sIr=0. Spontaneous recovery 
would then result from the dissipa- 
tion of Zz, as in Hull’s theory. If this 
were true, one might predict from 
the Jones revision that there would 
be very little, if any, spontaneous re- 
covery after extinction under high 
drive, but greater amounts of spon- 
taneous recovery after extinction 
under low drive. Since D— Ir would 
approach the threshold value quickly 
where D is initially low, there would 
result an appreciable increase in D, 
and hence of response strength, with 
the dissipation of Ir, and sponta- 
neous recovery would result. Under 
high drive D—In would not ap- 
proach the threshold value as quickly 
as would sHr—sIr. Thus, since 
sHr—sIr would be a smaller value 
after the first extinction, there should 
be less spontaneous recovery at the 
beginning of subsequent extinction 
periods. 

Different predictions may be made 
from Hull and the DXsIr revision 
concerning the effect of an increase 1n 
drive after extinction is complete. 
According to Hull’s (DXsHr)— sIr, 
an increase in drive after complete 
extinction should result in further 
“spontaneous recovery.” According 
to the DX(sHr— sIr) formulation, 
once extinction is complete (i.e. 
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sHr—sIrn=0), an increase in D 
should not produce any “spontaneous 
recovery.” 

Unfortunately, the experimental 
evidence bearing on all these predic- 
tions is meagre, conflicting, and in- 
conclusive. Hull (1943, p. 249) cites 
Pavlov’s finding that an increase in 
drive after extinction is complete 
causes the reappearance of the CR 
in the presence of the CS. This is, of 
course, consistent with Hull’s for- 
mulation, but not with the DX sIr 
formulation. The same phenomenon 
seems to occur also in instrumental 
conditioning. Jenkins and Daugherty 
(1951) extinguished a pecking re- 
sponse in pigeons under three levels 
of drive. They found that the num- 
ber of responses in extinction is a 
function of drive level and that when 
extinction was relatively complete an 
increase in drive caused gross re- 
covery of the conditioned behavior. 
The authors used the term “‘rela- 
tively complete” extinction because 
the pecking response in pigeons 
never seems to be completely ex- 
tinguished. But the recovery of a 
“relatively extinguished” CR under 
increased drive is certainly more con- 
sistent with (DXsHr)—sIp than 
with DX(sHr—sIr). The writer 
knows of only one study that ap- 
pears to contradict the finding of 
Jenkins and Daugherty. Crocetti 
(1952) found that when rats were 
“completely” extinguished in a Skin- 
ner box, increase in drive did not in- 
crease the response rate over the pre- 
conditioning response rate under the 
higher level of drive. (Extinction 
was considered complete when the 
response rate became equal to the 
operant level prior to conditioning.) 
This finding is, of course, inconsistent 
with Hull's (DX sHp) — sIr. Crocetti 
did not control for the changes in 
the drive stimulus (Sp) with in- 
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creased hunger, and so his finding is 
not definitive with respect to the 
present theoretical issue. If we as- 
sume that sHpr and sIr are condi- 
tioned to Sp as well as to other 
stimuli, then the changes in Sp from 
the conditioning trials to the extinc- 
tion trials or spontaneous recovery 
trials becomes a crucial point in this 
type of experiment. Fortunately in 
an experiment by Lewis and Cotton 
(1957) the effect of such changes in 
Sp was taken into account. Three 
groups of rats were trained in a run- 
ning response under three levels of 
drive, viz., 1, 6, and 22 hours’ food 
deprivation. Each group was then 
divided into three groups which un- 
derwent extinction under 1, 6, and 22 
hours’ drive. Extinction proceeded 
more rapidly under lower drive, as 
would be expected from Hull’s for- 
mulation, but drive level seemed to 
have no effect on the magnitude of 
spontaneous recovery, a fact which 
is inconsistent with (DX sHz) — sIr. 
But the DX (sHr— sIr) revision can- 
not comprehend both of these find- 
ings either, for with this formulation 
drive level should have no effect on 
number of trials to a criterion of ex- 
tinction. It seems obvious that 
clarification of the effects of drive on 
spontaneous recovery must await 
further experimentation which is 
specifically designed for this purpose 
and which takes into account both 
the energizing and stimulus proper- 
ties of drive. Some of the lack of 
consistency and agreement in this 
area may also be due to interspecies 
differences and to the use of different 
measures of response strength. La- 
tency, running time, response rate, 
and number of trials to extinction are 
used singly in different studies as 
measures of response strength even 
though they are far from being per- 
fectly correlated. Each measure un- 
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doubtedly involves certain parame- 
ters peculiar to itself. To use only 
one such measure of response strength 
and only one species of animal is an 
inadequate method for testing a pre- 
cise deduction from a general be- 
havior theory. 

In the delayed CR, the inhibition 
of the response during the period of 
delay is attributed in the Hullian sys- 
tem to sIr (Hull, 1952, p. 114). Con- 
sistent with Hull’s formulation of 
DXsHn—sIr is the fact that an in- 
crease in D lessens or eliminates the 
period of delay in the CR. The 
DXsIp formulation does not accom- 
modate this fact, but leads to an op- 
posite prediction, i.e., an increase in 
D should strengthen the inhibition of 
delay. 

One of the weakest points in Hull’s 
system involves the dependence of 
slp upon Jr. It is no less troublesome 
to any of the revisions. (Spence ex- 
cepted, since his inhibition concept 
has nothing in common with Ir.) It 
is stated that Ip is generated when- 
ever a response is made, the amount 
of Ir being a function of the effortful- 
ness of the response, and that Ir 
rapidly dissipates, accumulating only 
if responses follow one another in 
rapid succession. The dissipation of 
Ir, a “negative drive state,” rein- 
forces the habit of not responding, 
which is sIę. This hypothesis en- 
counters obvious difficulties. If a re- 
sponse is followed by the dissipation 
of Ir, this would seem to have all the 
requirements for reinforcing the re- 
sponse, leading to increased response 
strength rather than extinction.? 


2 . 
One can get around this problem, of course, 


by invoking the gradient of reinforcement. If 


the time required for the dissipation of Iz is 
hee than the effective gradient of rein- 
orcement, the foregoing proposition would 
Not hold true, At present there is no basis for 
arguing the point. While Hull gives 20-30 
Seconds as the maximum delay between the 


Also, subzero extinction would be un- 
likely if increases in sJe were de- 
pendent upon reactive inhibition (Tr). 
And it is almost impossible to explain 
the extinction of relatively effortless 
CRs, such as salivation, eyeblink, and 
the alpha rhythm, when the extinc- 
tion trials are widely spaced. Pavlov 
(1927, p. 76) obtained rapid extinc- 
tion of the salivary CR using only one 
presentation of the CS per day. 
Razran (1956, p. 43) has reviewed 
the evidence that contradicts a 
theory of extinction based on reac- 
tive inhibition. There are even cases 
where spaced trials have led to more 
rapid exintction than massed trials 
(Sheffield, 1950; Stanley, 1952). 
Kimble (1950) has argued from 
studies of motor learning that a cer- 
tain threshold or critical level of Ir 
must be reached before sIr develops. 
Motor learning experiments have 
presumably shown that Ir can form 
without leaving behind any sIr. This 
is inconsistent with extinction based 
on widely spaced trials. In fact, it 
does not seem to the writer that the 
Hullian inhibition postulates, as they 
have been used in the field of motor 
learning, represent the same processes 
found in extinction phenomena. It 
has been a case of giving the same 
theoretical labels to basically dif- 
ferent processes. The most funda- 
mental difference between sIr in con- 
ditioning and in motor learning has 
to do with the amount of response 
necessary to produce sIr. Five or six 
minutes of pursuit rotor practice 
seems necessary before sIr is in evi- 


d reinforcement if reinforcement is 
ime required for the dis- 
lely a function of the 
amount of Ir generated by the response and, 
therefore, is variable, although the rate of dis- 
sipation of Zz may not be variable. Perhaps an 
even simpler way out is the idea that Ir leads 
to a “resting response” which in turn is rein- 


forced by the dissipation of Iz. 
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Fic. 2. Illustrates algebraic summation the- 
ory of discrimination. (Effective reaction po- 
tential, sĒpg, is a result of subtraction of gen- 
eralized extinction, sZz, from generalized 
conditioning, sHz. See text for full explana- 
tion.) 


dence, while only a single condi- 
tioned response, such as salivation, 
PGR, or eyeblink, is evidently suf- 
ficient to produce sIr. Thus it does 
not seem that the sIr invoked in 
theories of motor learning could be 
the same sIr as that in Hull’s theory 
of conditioning. 

It is also held by Hull, and even 
more explicitly by his revisers, that 
the amount of sg built up per trial is 
related to the amount of Ir dis- 
sipated, the dissipation acting as re- 
inforcement for the negative habit, 
slr. But this is inconsistent with 
Hull’s own revision of his theory 
(Hull, 1951), in which the growth of 
habit, sr, and presumably also of 
negative habit, sIr, is a function only 
of the number of reinforcements and 
not the amount of reinforcement. 
None of these awkward predicaments 
has been remedied by the revisions 
here reviewed. Those revisions in- 
sisting on the theoretical equivalence 
of sHp and sIr as being merely posi- 
tive and negative habits have re- 
tained one of the weakest elements 

in the Hullian system. 
Discrimination learning. 


If dis- 
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crimination learning involves an in- 
crease in habit strength to the posi- 
tive stimulus (S?) and an increase 
in inhibition (Ir and sIr) to the 
negative stimulus (S4), then the ef- 
fects of drive on discrimination 
learning should be highly germane to 
the plausibility of the DX sIr for- 
mulation. Jones (1958) invokes 
Spence’s (1937) theory of discrimina- 
tion learning, adapted by Hull, in- 
volving the overlapping generaliza- 
tion gradients of sHr and sIr, in 
support of the DX sIr part of his re- 
vision. This theory is illustrated in 
Figure 2. The discrimination would 
be perfect (except for behavioral 
oscillation) when the net reaction po- 
tential resulting from subtracting the 
generalized sIr from the generalized 
sHnp is some positive value for S? and 
zero for S4, as in Figure 2. Jones 
(1958) argues that, according to 
Hull’s DX sHr—sIp, an increase in 
D should obliterate the learned dis- 
crimination. Since some discrimina- 
tions are not obliterated or even 
weakened by an increase in D, Jones 
reasons that s/z must also be multi- 
plied by D, so that the increase in 
slr will be proportional to the in- 
crease of sHz when multiplied by D, 
thereby preserving the discrimina- 
tion. 

Before Jones’ argument can be 
evaluated, some clarification of the 
Spence-Hull theory of discrimination 
learning is necessary. In the first 
place, there is often confusion con- 
cerning whether discrimination learn- 
ing is a matter only of the relative 
strengths of sEpr to the S? and Sô, 
or whether the formation of a dis- 
crimination requires the reduction of 
sEr to the S4 to zero or at least be- 
low the operant level of the response, 
i.e., below the strength of the re- 
sponse before any conditioning or ex- 
tinction has occurred. If the former, 


i O aA 


INHIBITION IN HULL'S SYSTEM 287 


then all that would be necessary for 
discrimination to take place would 
be that the S? have greater sEr than 
the S4. The sĒr to the Sê would not 
necessarily have to undergo some de- 
gree of extinction. If this were the 
case, Jones’ use of the Spence-Hull 
theory of discrimination, as illus- 
strated in Figure 2, would not be ap- 
plicable to the present argument con- 
cerning the effects of drive on dis- 
crimination learning. The evidence, 
however, strongly suggests that the 
sEp to the S4 must undergo some de- 
gree of extinction for discrimination 
to become nearly perfect. To this 
extent, at least, the Spence-Hull 
theory appears to be correct. 

For example, Grice (1948) gave 
one group of rats 200 rewarded trials 
in responding to a disc 8 centimeters 
in diameter and gave another group 
of rats 200 rewarded trials in re- 
sponding to a 5-centimeter disc. 
Then both groups were given dis- 
crimination training, with the 8- 
centimeter disc as the S? and the 5- 
centimeter disc as the S4. The group 
which had been previously rewarded 
on the 8-centimeter disc learned the 
discrimination faster. Now if all 
that were involved in discrimination 
were the relative response strengths 
to the S? and S4, the 8-centimeter 
group should have learned to make 
the discrimination immediately, since 
response to the S? had already been 
rewarded on 200 trials, and the re- 
sponse strength to the S4 resulting 
from stimulus generalization would 
have been less than the response 
Strength to the SP. Since the learn- 
ing curve for the acquisition of the 
discrimination is very gradual, how- 
ever, it suggests that extinction of the 
Tesponse to the S4 through non- 
j forcement is necessary for the 
€arning of the discrimination. 

An even more cogent demonstra- 


tion of the necessity for extinction of 
S4 in discrimination learning is an ex- 
periment by Fitzwater (1952). Three 
groups of rats were used: Groups A, 
B, and C. In preliminary training 
Group A was run an equal number of 
times into each of two alleys having 
differential cues—call them X and 
Y, respectively. X was always rein- 
forced; Y was never reinforced. 
Group B was run an equal number 
of times into each of two alleys hav- 
ing the Cues X and Z. X was always 
reinforced; Z was never reinforced. 
Group C was run only into one alley, 
with Cue X, the same number of 
times as the other groups. Then dis- 
crimination training began, with the 
animals having to learn to discrimi- 
nate X as the S? and Y as the Sê. 
Group A learned the discrimination 
most rapidly, while Groups B and C 
did not differ significantly in speed of 
learning. The theoretical interpreta- 
tion of these results is that Group A 
had already built up inhibition to the 
S4, while Groups B and C had not. 
Fitzwater concluded that 

apparently ina visual discrimination task it is 
about as important to establish an avoidance 
habit as an approach habit, and that an ap- 


preciable discrimination does not seem to oc- 
cur if an approach habit is established alone 


(p. 480). 


The terms ‘approach habit” and 
“avoidance habit” may be inter- 
preted in the context of the present 
discussion as excitation (or sHr) and 
extinction (sIr), respectively. Thus 
it is apparent that a decrease 1n sEr 
to the S4 as well as an increase in 
sĒr to the S? is necessary for dis- 
crimination learning. It is not just a 
matter of sEx to the S? being rela- 
tively greater than to the S405 © 
Another experiment by Grice 
(1949) offers further evidence that 
discrimination depends upon the ex- 
tinction of the response to the S4 and 
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not merely a relative difference in 
response strengths between S? and 
S4. One group of rats was trained in 
a visual size discrimination with S? 
and Sê presented simultaneously, 
and another group was trained on 
the same discrimination with S? and 
S4 presented successively in random 
order. Grice found no difference be- 
tween the “simultaneous” and ‘‘suc- 
cessive” groups in the rate of learn- 
ing the discrimination. In both cases 
learning apparently consisted of in- 
creasing the response strength to the 
S? and completely extinguishing the 
response to S. Furthermore it was 
found that the rats which learned the 
problem as a pair (i.e., simultaneous 
presentation) responded differently 
to the S? and S4 when they appeared 
singly, showing that even under 
simultaneous presentation of the 
S? and S4, the response to the S4 had 
undergone extinction. 
It is not maintained that complete 
extinction of the response to S4 is 
necessary. Extinction is a relative 
matter and is probably best meas- 
ured, notin relation to some theoret- 
ical “absolute zero,” but in relation 
to the “operant level” or probability 
of occurrence of the particular re- 
sponse before extinction trials have 
been assumed to take place. In 
the Grice (1949) experiment there 
was a decrease in latency of response 
to S? and an increase in latency of 
response to S4, whether the two 
stimuli were presented simultane- 
ously or successively. sEz to the S4, 
as measured by latency, was con- 
siderably less at the end of discrimi- 
nation training than at the begin- 
ning. In fact, extinction of response 
to S4 may play a greater role in dis- 
crimination learning than does the 
strengthening of the response to S?, 
Webb (1950) trained rats to jump to 
a black-white discrimination until it 
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was well learned. When, after train- 
ing, only the S? was presented to the 
rats, the mean latency of their re- 
sponse was 2.0 seconds, which was 
not significantly less than the pre- 
training latency. On the other hand, 
when only the S4 was presented, the 
mean latency of response was 80.5 
seconds, which may be interpreted 
as indicating considerable extinction 
or inhibition of the response to the 
S4. If one defines the zero level of 
sEr in the Hull-Spence model in 
Figure 2 simply as the operant level 
(i.e., the pretraining latency or prob- 
ability of responding to the particu- 
lar stimuli), then this model appears 
to be quite consistent with the ex- 
perimental evidence in showing that 
discrimination depends upon extinc- 
tion of the response :‘o the S4. 

This model, however, seems to be 
deficient in some other respects. 
Hanson (1957), for example, per- 
formed a very careful experiment 
which led to the conclusion that 
over-all response strength is not 
weakened by discrimination train- 
ing, as would be predicted from the 
Spence-Hull model. (That is, since 
the resultant sz is thé algebraic 
sum of generalized excitation and in- 
hibition, sEp to the S? should be less 
after discrimination training than it 
would be in simple conditioning to a 
single stimulus.) Hanson concluded 
that 


the major result of discrimination training is 
to bring a large proportion of the responses 
available in extinction under the control of 
another range of stimuli, those which do not 
ordinarily gain control of the response as the 
result of simple conditioning without differ- 
ential reinforcement (p. 889). 


This conclusion is not compatible 
with the Spence-Hull theory. 

It may be argued that Jones has 
taken the Spence-Hull diagram (Fig- 
ure 2) too literally. Very little is 
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known about the actual shapes of the 
generalization gradients of sHr and 
sIr, and until a proper metric is 
worked out, arguments over this 
point cannot be settled. What little 
evidence there is, though far from 
conclusive, suggests that the gen- 
eralization gradients of excitation 
and inhibition are probably different 
in a number of respects (Razran, 
1938). Furthermore, the amount of 
overlap of the gradients of excitation 
and inhibition will depend on the dis- 
tance apart of S? and S4, and there is 
reason to believe that the effects of 
drive on discrimination will interact 
with the degree of disparity between 
S and S4 (Broadhurst, 1957). We 
would predict from Hull's DXsHe 
— sIr that the farther apart S? and 
S4 are, the less deleterious to the dis- 
crimination are the: effects of in- 
creased drive. This essentially is the 
Yerkes-Dodson Law (Yerkes & Dod- 
son, 1908), which, in its most general 
form, states that the optimum mo- 
tivation for a learning task decreases 
with increasing difficulty. This rela- 
tionship between drive and difficulty 
of discrimination, however, cannot be 
predicted from the Jones formulation 
of DX(sHr— sIr). 

Rather than arguing from a highly 
hypothetical model involving the 
relative shapes and magnitudes of 
the generalization gradients of sHr 
and sIr, as Jones has done, we can 
better make predictions concerning 
the directly observable effects of in- 
creased drive on discriminations. 
What is the effect of drive on the 
initial learning of a discrimination, 
and does an increase in drive have a 
different effect on the learning of easy 
and difficult discriminations, as de- 
termined by time required for learn- 
Ing? What is the effect of change in 
drive on discriminations that are 
already established? What effect does 
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a change in drive have on the extine- 
tion of a discrimination? 

In discrimination learning, since 
the relative amounts of sHr and 
sIr built up to the S? and S4 are dif- 
ferent, we would expect from the 
DX(sHr—sIr) formula that an in- 
crease in D would always have a 
facilitative effect on learning a dis- 
crimination. The degree of facilita- 
tion would depend upon the degree of 
difference between S? and S*. If we 
assumed considerable overlapping of 
generalization gradients, then there 
would be relatively little effect of an_ 
increase in D. If the discrimination 
were easy, increases in D should im- 
prove the discrimination, since the 
relatively greater sHr to the Sp and 
the relatively greater sIr to the S4 
would both he multiplied by D. In 
no case should discrimination be 
weakened by an increase in D. 

On the other hand, if we assume 
that response to S* must undergo 
extinction for a discrimination to be 
learned, Hull’s formula DXsHr—sIr 
leads to quite different predictions, 
viz., that increase in D should weaken 
difficult discriminations, where one 
might assume overlap of the stimulus 
generalization gradients, but would 
strengthen discriminations in which 
S? and Sê are widely separated on 
the generalization gradient. 

What is the evidence? We have al- 
ready mentioned the Yerkes-Dodson 
Law, which is possibly consistent 
with Hull, but certainly not with the 
DX (sHr—slr) formula. Broad- 
hurst (1957) has demonstrated this 
“law” most effectively, using rats in 
a brightness discrimination problem 
and manipulating drive by means of 
oxygen deprivation. Skinner (1938, 
p. 188) has observed that it 1s 1m- 
portant in establishing discriminant 
operant conditioning to keep _the 
hunger drive as constant as possible, 
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for changes in drive disturb the dis- 
crimination. More explicitly, Teel 
(1952) has shown that in selective 
learning, in which correct responses 
are reinforced and incorrect responses 
are nonreinforced or extinguished, 
rats under high drive (food depriva- 
tion) require a greater number of 
trials to reach a criterion of learning 
than rats under low drive. One can- 
not predict these facts from the 
DX(sHr—sIr) formula. In fact, 
just the opposite outcome would be 
predicted for the Teel experiment. 
With human subjects, Hilgard, Jones, 
and Kaplan (1951) found that high 
anxiety subjects (on Taylor Manifest 
Anxiety scale) had greater difficulty 
than low anxiety subjects in forming 
a discriminatory CR. It is well- 
established that anxious subjects de- 
velop simple eyeblink CRs more 
readily than nonanxious subjects. 
(This relationship has not been found 
to hold, however, for autonomic 
CRs.) Interpreting anxiety as a 
drive, both sets of findings are con- 
sistent with Hull, but not with 
DX(sHr—sIr). An experiment by 
Spence and Farber (1954) found that 
the difference between high and low 
anxious subjects in forming a dis- 
criminatory response showed up only 
on the S? but not on the S4. That is, 
D (anxiety) seemed to affect only the 
CS (i.e, S?) associated with rela- 
tively greater sp and not the CS 
(i.e., S4) associated with relatively 
greater sIr. Spence interprets this 
finding as evidence that D interacts 
with excitation (sp) but not with 
inhibition (sZz). 

In a well-established discrimina- 
tion, in which S? and Sô are rela- 
tively far apart on the stimulus gen- 
eralization gradient, and in which 
relatively more sp than slr has 
been built up to S? than to S4, and 
relatively more sr built up to S4 
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than to S?, we would predict from 
DX(sHr—sIr) an improvement in 
the discrimination with an increase 
in drive. That is, the ratio of number 
of responses to S? to number of re- 
sponses to Sê should increase, since 
response to S? is increased by 
DXsHp, and inhibition of response 
to S4 is increased by DX sIr. Dins- 
moor (1952) performed an experi- 
ment bearing on this point. A simple 
discrimination habit was well-estab- 
lished in rats in the Skinner box, with 
S? being the presence of light and S4 
being total darkness. When D was 
increased to varying degrees by food 
deprivation, the number of responses 
per unit of time to both S? and S4 
increased, but the ratio of S? and S4 
responses remained exactly the same 
at seven different degrees of hunger. 
In short, the discrimination was not 
improved by an increase in D. 
Though Hull’s theory is not suf- 
ficiently quantified to have precisely 
predicted the outcome of this par- 
ticular experiment (because absolute 
levels of D and sHp as well as the 
jnd’s between S? and S4 must be 
taken into account), at least the re- 
sult is consistent with the (D X sHn) 
— slp formulation. 

There is other experimental evi- 
dence, however, which suggests that 
both the Hullian and the revised for- 
mulations are inadequate to explain 
the effects of drive on discrimination 
learning. A number of studies have 
found no relationship at all between 
drive and proficiency in selective 
learning or solving discrimination 
problems (Meyer, 1951; Miles, 1959; 
and a number of doctoral disserta- 
tions reported by Spence, Goodrich, 
& Ross, 1959). Spence et al. (1959) 
have scrutinized the conflicting find- 
ings in this field with a view to dis- 
covering the reason for the lack of 
agreement between various investiga- 


i E O N e a S o a 


INHIBITION IN HULL'S SYSTEM 291 


tions on the effect of drive on selec- 
tive learning and discrimination. 
They arrived at the hypothesis that 
performance in selective learning 
(such as learning a black-white dis- 
crimination) is independent of drive 
level when responses to the S? and 
S4 are equated, but varies with drive 
when responses are not equated. 
They performed a set of experiments 
which supported this hypothesis. 
The results are inexplicable in terms 
of Hull’s theory or any of its revisions 
except that of Spence. These findings 
suggest that the growth of sHe is not 
a function of number of reinforced 
responses, as in Hull’s system, but isa 
function merely of the number of 
responses, whether reinforced or not. 
The growth of inhibition is a function 
only of the number of nonreinforced 
trials. This formulation will account 
for the major finding of the experi- 
ment by Spence et al. (1959). But 
another aspect of their findings re- 
mains inexplicable in terms of any 
current theory of learning. When 
responses to S? and S^ were equated, 
an increase in drive increased the 
response strength to both the S? and 
S4. But when the rats were forced to 
respond twice as often to S? as to Se 
an increase in drive increased the 
response strength to S? but decreased 
response strength to S4. Spence et al. 
concluded that 


the results of the two (experiments) are in 
fundamental disagreement so far as the effects 
of drive differences on the strength of nonrein- 
forced responses are concerned. It is perhaps 
obvious that we need to obtain much more 
knowledge than we now possess concerning 
the variables affecting the development of re- 
sponse decrement with nonreinforcement. 
Unfortunately, this problem has been badly 
neglected in conditioning experiments with 
the consequence that such an empirically 
based theory as the present one [i.e., Spence’s 
theory] is weakest in this area (p. 15). 


Though the present state of our 


knowledge in this area does not per- 
mit any definite conclusion regarding 
the effects of drive on discrimination 
learning, it appears that no current 
theory is able to comprehend all the 
relevant facts now available. 

But now let us ask: What happens 
when a discrimination is extinguished 
under various levels of drive? Cau- 
tela (1956) trained rats ina discrimi- 
nation under 23 hours’ food depriva- 
tion and then extinguished the dis- 
criminative response under 0, 6, 12, 
23, 47, and 71 hours’ deprivation. 
The criterion of extinction was failure 
to respond to either S? or S* within 3 
minutes. Many more responses were 
required for extinction under high 
drive levels (23, 47, or 71 hours’ 
deprivation) than under low drive (0, 
6, or 12 hours). This result can be 
predicted from D X sHr —sIr. On the 
other hand, it is difficult to see why a 
change in drive should have any 
effect on the number of responses to 
extinction if sHr and sIr are both 
increased or decreased proportion- 
ately by changes in D, as stated in the 
revised formula. 


D-—Ir 

Since Hull referred to reactive 
inhibition (Ir) as a “negative drive,” 
he has been accused of logical incon- 
sistency for adding a drive to a habit 
(ie In+ sIr) and the suggested 
remedy has been the obvious one, 
viz., to subtract Ir from D. But pre- 
dictions from this formulation lead to 
empirical embarrassment. For ex- 
ample, when extinction is carried out 
under massed trials, and, after a 
period of rest, there is some spon- 
taneous recover, we must assume, 
according to the ‘Ep=(D—Ir)X(sHr 
— sIr) formulation, that D re Ir = 0 
at the end of the first extinction 
period. For there would be no spon- 
taneous recovery if it were sHr— siR 
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that had become equal to zero. Yet, 
according to Hullian theory (includ- 
ing the revisions), no behavior can 
occur unless D is greater than zero. 
And it is known that an animal at the 
end of extinction is far from being 
inactive. Only the extinguished CR 
becomes inactive, while other behav- 
ior in the animal's repertoire is im- 
mediately evident. Theoretically 
this could not be so if the drive com- 
ponent in the equation for reaction 
potential were zero. 

Experimental evidence contradict- 
ing D— [Ir is presented by Hull (1952, 
p. 50). A rat is trained to press either 
of two bars in different locations in a 
Skinner box to obtain food. During 
extinction the rat alternates its re- 
sponse from one bar to the other. Zp 
does not have to dissipate before the 
alternate bar can be pressed. This 
strongly suggests that Jp must be 
associated with the particular re- 
sponse, rather than cause a diminu- 
tion in the total drive state, which in 
the Hullian system is an amalgam of 
all the organic needs of the moment 
and their associated “drive stimuli” 
(Sp). 

In an experiment highly relevant to 
this point, Smith and Hay (1954) 
took advantage of the great sensi- 
tivity to changes in drive of rate of 
responding in the Skinner box. As 
Soon as operant conditioning had led 
to a stable response rate, a discrimi- 
natory stimulus was introduced, the 
S? always being reinforced, the S4 
never. During the learning of the dis- 
crimination, the number of responses 
to S? increased while the number of 
responses to S4 decreased, but the 
rate of responding remained constant. 
If the extinction of S4 had involved 
D-—Ir, there should have been the 

decrease in over-all response rate 
which is associated with lowered 
drive. On the other hand, this finding 
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is entirely consistent with Hull's 
formulation. 


InXsIr 


Here we have a formulation which, 
if the rules of algebra are followed 
religiously in manipulating Hullian 
variables, leads to a paradox—a 
positive addition to reaction poten- 
tial resulting from the interaction of 
two inhibitory variables. Jones 
(1958) even goes on to say that the 
paradoxical outcome of Ip XJ in- 
creasing sEp might explain the 
“ultraparadoxical effect” described 
by Pavlov (1927). This might be 
called explanation by clang associa- 
tion.* It is difficult for the writer to 
understand why Jones and other 
revisers have so gratuitously regarded 
the minus sign as being permanently 
attached to Ip and sIr. Though these 
quantities are subtracted from posi- 
tive reaction potential, the negative 
sign is not necessarily an inherent 
part of these inhibition variables. 
Even if Zp and sZz were multiplica- 
tively related, there is no reason why 
their product could not be subtracted 
from the positive reaction potential. 

The empirical evidence regarding 
the InXsJz interaction is far from 
satisfactory, for there is always an 
“out” via the possible interaction of 
all the other intervening variables in 
the system. But in terms of sheer 
plausibility—and that is all one can 


* The “paradoxical” and “ultraparadoxical” 
effects observed by Pavlov, in which a weaker 
intensity of the CS will elicit a CR that had 
been extinguished to a stronger intensity of 
the CS, are probably best explained in terms 
of a generalization gradient on the stimulus 
intensity dimension. Because of the gradient, 
extinctive inhibition built up to a CS of one 
intensity will not be sufficient to inhibit the 
CR to a CS of a different intensity, even 
though it be weaker. Or the effect may be ex- 
plained as disinhibition caused by a “novel” 
stimulus—novel because the intensity is 
weaker than that of the original CS. 


on at present—it must be said that 
fpX sx is a weak formulation. The 
nly relevant evidence comes from 
experiments on motor learning, the 
area in which there are rather 
clear-cut operational definitions of 
what constitutes Jp and slg. In 
general, performance decrement that 
dissipates during rest is identified 
with Zp; the decrement that still 
remains after rest is identified with 


R. 

Duncan (1951) gave two groups of 
subjects massed and distributed prac- 
tice on the pursuit rotor. During this 
§-minute practice period, the massed 
group presumably would develop 
more Jp and hence more sIr. Then 
both groups were allowed 10 minutes 
of rest, so that at the beginning of the 
postrest trials, nearly all Ip should 
have dissipated, leaving the two 
groups differing only in slr. The 
postrest trials were massed for both 
groups. Here exist the very condi- 
tions which should allow an InXslIr 
interaction to show itself. If there 
Were an interaction, the postrest 
performance curves of the two groups 
should diverge. In fact, they did not 
diverge, or converge, but ran exactly 
_ parallel throughout the postrest trials, 
which suggests an additive rather than 
multiplicative relationship between 
Tr and sIr. There are certain weak- 
nesses and peculiarities in Duncan's 
j study (for example, it could be argued 
that the 5 minutes’ practice was not 
sufficient to attain the threshold of Ir 
necessary for the development of sr, 
the evidence for which has been pre- 
sented by Kimble, 1950); but on the 
hole, it favors Hull’s formulation 
egarding inhibition more than it 
ors those formulations which in- 
ve IrXsIĪręr. Another study by 
rkweather and Duncan (1954) was 
entially the same as the previous 
periment except that the massed 
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group was given more prerest prac- 
tice so that performance on the first 
postrest trial would be the same for 
both massed and distributed groups. 
The rest period was 24 hours. Again, 
when both groups were given 
practice after the rest, their perform- 
ance curves were approximately par- 
allel, suggesting that there is no 
interaction between Jp and sfx. Itis 
possible to argue from some of the 
evidence in this study, however, that 
the presence of sIr was not clearly 
demonstrated.‘ 

Better evidence is presented by 
Bourne and Archer (1956). Groups 
trained under massed and distributed 
practice on the pursuit rotor were 
given 5 minutes’ rest, and then all 
groups performed under massed con- 
ditions. The performance curves 
converged in the postrest period. But 
the convergence consisted of the 
performance of the previously dis- 
tributed group reducing to that of the 
massed group. If the Ir X sIr formu- 
lation were correct, the result should 
have been just the opposite, with the 
previously massed group showing an 
increase up to the level of the distrib- 
uted group. The prerest practice 
was more prolonged in this study 
than in Duncan's, and it can be 
argued that there was a sufficient 
amount of sIr generated to permit 
the IrXsIr to show itself. Yet, in 
another motor learning experiment 
specially designed to determine if 
there was an interaction between Ir 
and sIr, Bowen, Ross, and Andrews 
(1956) failed to find any evidence of 
interaction. So while the evidence 1s 
not definitive on this point, the pre- 
ponderance of it does not favor the 


4 It seems fairly certain that the concept of 
sIr invoked to explain decremental phenom- 


ena in motor learning could not represent the 


same process as the sIr involved in experi- 
mental extinction. 
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IrXsIn formulation. The issue, 
however, does not seem beyond a 
clear-cut experimental test. For 
example, in the Jones revision DXsIr 
would always have to be greater than 
InXsIn, because there can be no 
performance when D is equal to or 
less than Jp. If this were true, a 
person practicing on the pursuit rotor 
over a long period should finally 
become unable to perform, since sIr 
would continue to grow and inhibit 
performance. After Ig had dissi- 
pated, DXsIr would approach or 
equal D X sHp, and the subject would 
be unable to perform the pursuit task. 
Gleitman, Nachmais, and Neisser 
(1954) were the first to point out this 
consequence with respect to Hull’s 
formulation. As far as the writer 
knows, no one has ever found this 
kind of “extinction” of the pursuit 
rotor skill. Subjects have been known 
to practice the pursuit task day after 
day for months, long after having 
reached an asymptote for time on 
target, yet they show no loss of the 
skill. Hull's formula, on the other 
hand, can get around this problem, 
the arguments of Gleitman et al. 
(1954) notwithstanding. If sHp and 
sIr both reach an asymptote (Hull, 
1951), extinction will have occurred 
when slz=DXsHp. An increase in D 
will make it possible for DX sHp to be 
greater than the symptote of sIr, so 
that extinction need never occur if D 
remains sufficiently high. Indeed, 
there are instances (Solomon 
& Wynne, 1954) of absence of extinc- 
tion in escape and avoidance training 
in which the drive is a very strong 
shock-induced fear reaction. 

The unlikely prediction made from 
Hull’s theory by Gleitman et al. 
(1954) that any response, even though 
always reinforced, would eventually 
extinguish if it were repeated often 
enough was directly tested in experi- 
ments by Calvin, Clifford, Clifford, 
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Bolden, and Harvey (1956) and 
Kendrick (1958). Their studies differ 
in a few details of experimental pro- 
cedure. Essentially they ran rats 
down a long alleyway at the end of 
which the rats received reinforcement 
on every trial. After some hundreds of 
trials (spread over many days) all the 
rats ceased running down the alley; 
they would not leave the starting box 
for a specified period of time desig- 
nated as the criterion for ‘‘complete” 
extinction. Though this outcome 
lends support to Hull’s theory, other 
interpretations are certainly possible 
(see Mowrer, 1960, pp. 426-432; 
Prokasy, 1960). The results of the 
Calvin et al. and Kendrick experi- 
ments may well be due to peculiari- 
ties of the experimental procedure. If 
not, one should expect “extinction 
with reinforcement” to occur in many 
other kinds of performance, such as a 
rat's bar pressing or a pigeon’s peck- 
ing in a Skinner box, and in many 
types of repetitious motor tasks. 

One experiment is highly relevant 
to theoretical predictions regarding 
the effects of drive on motor learning. 
Wasserman (1951), using a motor 
learning task (alphabet printing) 
found that high motivation resulted 
in performance which was signifi- 
cantly superior to that of low motiva- 
tion (in both massed and distributed 
Practice groups), the difference be- 
coming progressively greater as prac- 
tice continued. The Jones revision 
would predict just the opposite. 
Since D must always be greater than 
Ir, DXsIn would result in greater 
performance decrement for the highly 
motivated group. The motivation in 
this experiment was controlled by the 
instructions given to the subjects, one 
group being task-oriented, the other 
ego-oriented. 


vs R x sH; R 
This formulation of an interaction 


een reactive inhibition and habit 
frength implies that the decremental 
fects on performance caused by the 
Gonditions producing Ir (effort and 
Tate of response) will be greater for 
Strong than for weak habits. This is 
patently incorrect, since it is known 
that there is a positive correlation 
between number of reinforced re- 
sponses, of which sHz is a function, 
and the number of responses emitted 
during extinction. The IęłXsHr 
formulation would predict just the 
opposite, i.e., a negative correlation 
between number of reinforcements 
and number of responses to extinc- 
tion. This conclusion is not weakened 
by the fact (for example, Reid, 1953) 
that in learning to make a discrimina- 
tion reversal the animals that have 
had a greater number of prereversal 
trials learn the reversal more quickly. 
4 This phenomenon may be interpreted 
in terms of the animal’s also over- 
learning the act of making a discrimi- 
nation (in addition to learning to 
‘respond differentially to the S? and 
_ $4), which facilitates the learning of 
__ the reversal. ; 
— SHaXsIr 
y This formulation, derived from 
= Iwahara (1957), is subject to the 
_ Same criticism just made in the case 
of IrXsHr. It implies that the 
stronger the habit, the more quickly 


t it should extinguish, which certainly 
1s not true. 


=l 


_ The suggestion of Woodworth and 
Schlosberg (1954), that total inhibi- 
tion (Ir=Ir+sIr) be subtracted 
rom incentive motivation, K (a func- 
on of amount of reinforcement), 
s plausible, in that extinction 
lves the withdrawal of incentive. 
in the total Hullian formulation, 
ever, the Woodworth and Schlos- 
z suggestion meets with the same 


. 


INHIBITION IN HULL'S SYSTEM 


difficulties pointed out in the two 
previous cases. Thus: 


sEx= DX(K—In—sln)XsHn 
In expanded form: 
sEx= DXK—DXIn—DXsln 
X DXsHeXKXsHe 
—sHgrXIr—sHr- sIr 


Thus we have again all of the ele- 
ments that have already been criti- 
cized. Spence (1956) has argued, on 
the basis of experimental findings, 
that D and K are additive rather than 
multiplicative as in Hull. But here 
again the defects of the Woodworth 
and Schlosberg suggestion of K -İr 
are evident. 


sEr=(D+K-—Iz)XsHr 


Expanded: 
sEr= DXsHr+KXsHr-sHrXİr 


The last term in the expanded form- 
ula again meets with the same diffi- 
culty pointed out above. It must be 
concluded that the K —Ip formula- 
tion is not an improvement on Hull or 
Spence. 

SUMMARY 


Several attempts to reformulate 
Hull's theory with respect to the 
inhibition postulates have been criti- 
cized. Because of the limitations of 
both Hull and his revisers in the 
exact quantification of intervening 
variables, much of the choice between 
alternative versions of the theory 
must be made on the basis of plausi- 
bility of congruence with empirical 
findings rather than of prediction of 
these findings in the rigorous sense of 
the term. All of the attempted re- 
visions to date, with the possible 
exception of that of Spence, have 
serious shortcomings in the light of 
experimental evidence. They cannot, 
therefore, be regarded as improve- 


296 


ments over Hull's original formula- 
tion of reaction potential. Advances 
will be made, not by the mere alge- 
braic manipulation of Hull's inter- 
vening variables—the method that 
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The operation of reliable response 
sets or stylistic consistencies has been 
frequently noted on personality and 
attitude scales with a true-false or 
agree-disagree format (cf. Cronbach, 
1946, 1950; Fricke, 1956; Messick & 
Jackson, 1958). It has recently been 
conjectured (Jackson & Messick, 
1958) that the major common factors 
in personality inventories of this type 
are interpretable primarily in terms 
of such stylistic consistencies rather 
than in terms of specific item content. 
The present paper attempts to anno- 
tate the influence of two response 
styles, the tendency to agree or ac- 
quiesce and the tendency to respond 
in a desirable way, using the Minne- 
sota Multiphasic Personality Inven- 
tory (MMPI) as an example of 
inventories with this general response 
form. In particular, a high correla- 
tion will be noted between factor 
loadings on the largest factor, as 
obtained in several published factor 
analyses of the MMPI, and certain 
indices of acquiescence. 

Barnes (1956b), in evaluating the 
Berg (1955) deviation hypothesis on 
the MMPI, found that the tendency 
to answer atypically or deviantly 

true” was highly correlated with 


1 This study is part of a larger project on 
stylistic determinants in clinical personality 
assessment supported by the National Insti- 
tute of Mental Health, United States Public 
Health Service, under Research Grants M- 
2878 to Educational Testing Service and M- 
2738 to Pennsylvania State University. The 
authors wish to thank George S. Welsh for 
graciously supplying scoring keys for his 

Pure” MMPI scales and Philip E. Slater for 
rived available his factor analyses of the 


DOUGLAS N. JACKSON 
Pennsylvania State University 


scores on the psychotic scales, and 
the tendency to answer atypically 
“false” was highly correlated with the 
neurotic triad. This result is consist- 
ent with the fact, noted by Cottle and 
Powell (1951) and others (Barnes, 
1956b; Fricke, 1956), that a large 
proportion of MMPI psychotic items 
are keyed true and a large proportion 
of neurotic items keyed false, suggest- 
ing that differential tendencies to 
respond atypically “true” and “false” 
might have been involved in the dis- 
crimination of criterion groups upon 
which the scoring keys were based. 
Barnes (1956a) also pointed out a 
marked similarity between the corre- 
lations of MMPI scales with these 
two deviant response tendencies and 
factor loadings for the scales on 
the two major factors reported by- 
Wheeler, Little, and Lehner (1951); 
he concluded that the number of 
atypical true answers is a “pure 
factor test” of the first or “psychotic” 
factor and that the number of deviant 
false answers has a high loading on 
the second or “neurotic” factor. The 
two major MMPI factors obtained 
by Welsh (1956) also displayed a 
similar pattern of loadings, and it is 
noteworthy that the “pure factor” 
reference scale A which Welsh devel- 
oped for his first or “anxiety” factor 
had 38 out of 39 items keyed true, 
while the reference scale R for the 
second or “repression?” factor had all 
40 of its items keyed false. — — 
In view of the striking similarity 
between the effects of consistent 
tendencies to respond “true” and _ 
“false” and patterns of factor load- 
ings obtained in two studies of 
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MMPI scales, all factor analyses of 
the MMPI readily available in the 
literature were reviewed, in order to 
evaluate the possible relationship 
between each scale’s factor loading on 
the major factor and an index of its 
potential for reflecting acquiescence. 
The particular index of acquiescence 
used was the proportion of items 
keyed true on each scale, which, 
assuming that the acquiescence-evok- 
ing properties of items are uniform 
over all MMPI scales, can be consid- 
ered to reflect the extent to which 
total scores on a scale are influenced 
by consistent tendencies to respond 
“true.” High scores on a scale with a 
large proportion of items keyed true 
would thus be assumed to reflect a 
general tendency to acquiesce, in 
addition, of course, to the contribu- 
tion of other stylistic tendencies 
and of systematic content responses. 
Jackson (1960) used this index to 
evaluate the effects of acquiescence 
on the California Psychological In- 
ventory, and Voas (1958) used the 
proportion of items keyed false as a 
criterion for constructing response 
bias scales. Voas (1958) also esti- 
mated loadings for scales from the 
MMPI and the Guilford-Zimmerman 
Temperament Survey on a factor 
marked by two measures of the tend- 
ency to respond ‘‘false’ and found 
that these loadings correlated .86 
with the proportion of items keyed 
false on each scale. These findings 
support the use of the index in the 
present context. 

Factor loadings for MMPI scales 
were obtained from eight studies 
by Abrams (1949, summarized by 
French, 1953), Cook and Wherry 
(1950), Cottle (1950), Tyler (1951), 
Wheeler, Little, and Lehner (1951), 
Welsh (1956), Slater (1958), and Kas- 
sebaum, Couch, and Slater (1959). A 
fairly uniform finding from these 
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studies is that only two major factors 
and two or three minor ones are 
necessary to account for interrela- 
tions among the scales. Spearman 
rank correlations were computed 
between loadings on the largest factor 
in each study and the proportion of 
items keyed true on each scale; the 
results are summarized in Table 1. In 
some of the factor analyses, values 
were not reported for scales with 
small loadings on the factor, so 
in computing correlation coefficients 
these scales were considered to be 
tied at an appropriate rank below 
scales with reported positive loadings 
and above scales with reported nega- 
tive loadings. Corrections for ties (cf. 
Siegel, 1956) were computed for two 
of the studies with the most scales 
tied at the same rank (Wheeler, 
Little, & Lehner’s normal sample and 
Tyler’s sample), but the coefficients 
changed only .01. 

Of 11 different subject samples 
represented in these eight studies, 
significant correlations were obtained 
for 8 of them, four of the coefficients 
exceeding .85. These strikingly con- 
sistent findings indicate that in most 
of these studies the largest factor on 
the MMPI is interpretable in terms 
of acquiescence. In evaluating the 
few apparently inconsistent results, 
it is important to note that for 
Abrams’s (1949) neurotic sample, the 
correlation with the largest factor 
was —.15, but with the second largest 
it was .52. Also, in Tyler’s (1951) 
study the correlation with the largest 
rotated factor was .33, but with the 
unrotated first centroid it was .52, 
~P<.05. These findings suggest that 
for those studies in which the corre- 
spondence between the proportion of 
items keyed true and the factor 
loadings was not close, the factor 
structures could have been rotated to 
produce a higher correlation. Ana- 
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SPEARMAN RANK CORRELATION (p) BETWEEN FACTOR LoapINGs ON THE Largest MMPI 
FACTOR AND PROPORTION OF ITEMs KEYED “True” on Each SCALE 


Study 


Scales Included 


Sample 


Abrams, 1949 


11 scales: L, F, Hs, D, Hy, Pd, Mf, 
Pt, Sc, Ma ae a 


117 normal male 
veterans 


201 neurotic male 


P 
.907%* 


—.148 (largest factor) 


F. veterans -516 (2nd largest) 
Cook & Wherry, 11 scales: L, F, Hs, D, Hy, Pd, Mj 
1950 Pt, Sc, Ma 3 nied agp wae os 
Cottle, 1950 11 scales: L, F, Hs, D, Hy, Pd, Mf, P. 
ctr , Hy, Pd, Mf, Pa, | 400 male veterans 916% 
Tyler, 1951 15 scales: Hs, D, Hy, Pd, Mf, Pa, Pt, | 107 fem d 
Sc, Ma, Si, St, Pr, Ac, nie. moet tte ee 
Wheeler, Little, & | 12 scales: L, K, F, H 
Lehner, 1951 Pa, Pt, Sc, ee Bied, Meaga org ee hat 
110 male neuropsy- .874** 
chiatric patients 
Welsh, 1956 11 pure scales: K’, Hs’, D’, Hy’, Pd’, | 150 male VA 
i Ky s Do , ' eral 870** 
Mf’, Pa’, Pt’, Sc’, Ma’, Si’ hospital patients 
11 pure scales plus A, Gm, Ja, R Same 150 males 897** 
Slater, 1958 43 scales: L, F, K, Hs, D, H: 
, : L, F, K, Hs, D, Hy, Pd, Mf, | 102 aged males 723° 
Pa, Pt, Sc, Ma, Si, Nm, Dp, Fm, A, 
R, Im, Pr, To, C, P, Sp, Rb, Sy, Res ~~~ E ae 
‘St, Lp, Do, Es, le, Ac, Ai, O-I, Lb, | 109 aged females 718 
Ne, Ca, Pl, Ht, Cht, Zi, Za 
Kassebaum, Couch, | 32 scales: L, F, K, Hs, D, Hy, Pd, Mf, | 160 Harvard College 625% 


& Slater, 1959 Pa, Pi, Sc, Ma, Si, 


Es, Ie, Lp, Ai, 


freshmen 


Sy, Ac, Re, Do, Pr, St, Im, Sp, Fm, 


Rp, R, A, Dp, To, OI 


lytical procedures similar to the 
computation of B weights in multiple 
correlation analysis are available 
(Mosier, 1939) for rotating to maxi- 
mize the correlation between a factor 
and a criterion, which in this case 
would be a vector of proportions of 
true items. However, an adequate 
application of this technique requires 
loadings for all the scales on the fac- 
tors under consideration, and for 
those studies providing this informa- 
tion (e.g., Welsh, 1956) there was 
usually little need to rotate. 

Another consideration which sug- 
gests that a rotation of axes might 
clarify the role of acquiescence on the 
MMPI is the fact that scales with 
high loadings on the second largest 
MMPI factor usually tend to have a 
high proportion of false items in their 


keys. Kassebaum, Couch, and Slater 
(1959) noticed this in their factor 
results and suggested that their 
second factor partly reflected a gen- 
eral tendency to respond “false.” 
Although correlations between the 
proportion of items keyed true and 
loadings on the second MMPI factor 
are usually not nearly as high as cor- 
relations with the first factor, some 
significant coefficients occur; €-8., the 
correlation between the proportion of 
items keyed true and loadings on the 
second factor in the study by Kasse- 
baum, Couch, and Slater (1959) was 
—44, p<.05 with 30 df, and in 
Welsh’s (1956) study it was —.64, 
p<.05 with 13 df. 

This result is consistent with 
Barnes’ (1956a) finding of a corre- 
spondence between atypical true 
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answers and the first MMPI factor 
and atypical false answers and the 
second factor. Since these two factors 
are usually orthogonal, this corre- 
spondence might be considered evi- 
dence for two relatively independent 
response biases, one a tendency to 
agree and the other to disagree. 
Such a contention is consistent with 
Barnes’ (1956b) finding of a correla- 
tion of .11 between deviant responses 
answered “true” and “‘false’’ and 
with the fact that Welsh’s (1956) A 
and R scales are usually only slightly 
negatively correlated. Although these 
results cannot be accounted for by a 
simple response set of acquiescence, it 
is not necessary to postulate two 
independent sets to agree and to 
disagree. As has been pointed out 
(Jackson & Messick, 1958), all that is 

required to account for the findings is 

the operation of at least one other 

factor in conjunction with acquies- 

cence. Thus, the A scale can have a 

high positive loading on an acquies- 

cence factor and the R scale a high 

negative loading, yet the two scales 

could be uncorrelated if they both 

had positive, or negative, loadings on 

some other dimension. Other factors 

which could moderate the operation 

of acquiescence on the MMPI might 

be specific content dimensions or 

some other response style. As previ- 

ously suggested (Jackson & Messick, 

1958), a particularly likely candidate 

for such a role is the stylistic tend- 

ency to respond in a desirable way. 
Possible influenceson MMPI scores 

of a set to respond desirably have 
been widely documented (cf. De Soto 
& Kuethe, 1959; Edwards, 1957; 
Fordyce, 1956; Hanley, 1956, 1957; 
Jackson & Messick, 1958; Taylor, 
1959; Wiggins & Rumrill, 1959). 
Fordyce (1956), for example, has 
noted a marked similarity between 


loadings on the largest MMPI factor 
from Wheeler, Little, and Lehner’s 
(1951) psychiatric sample and corre- 
lations of MMPI scales with a meas- 
ure of desirability. In fact, the rank 
correlation between the loadings and 
the correlation coefficients is approxi- 
mately —.75, and since the propor- 
tion of items keyed true on each 
MMPI scale correlates only about 
—.50 with the desirability coeffi- 
cients, it seems likely that a combina- 
tion of desirability and acquiescence 
would lead to even better prediction 
of the factor (cf. Messick, 1959). 
Although this and some other re- 
ported relationships are somewhat 
equivocal because the measures of 
desirability used were partially con- 
founded with acquiescence, e.g., Ed- 
wards’ SD scale and Hanley’s Ex 
scale, high correlations have also been 
reported between MMPI scales and 
desirability measures having a bal- 
anced number of true and false items 
(Edwards, 1957; Hanley, 1957; Wig- 
gins & Rumrill, 1959), 

In an attempt to take these find- 
ings into account, it is suggested that 
the acquiescence-evoking properties 
of items are not, as assumed above, 
uniform over all scales, but that 
acquiescence is elicited differentially 
as a function, perhaps, of specific 
item content, of the clarity or ambi- 
guity with which the content is 
stated, and in particular of the per- 
ceived desirability of the statement. 
In the extreme, it is suggested that 
the two major factors usually found 
for the MMPI may be rotated into 
positions interpretable as two re- 
sponse styles—the tendency to ac- 
quiesce and the tendency to respond 
desirably. The negative poles of these 
dimensions would be the tendencies 
to disagree and to respond undesir- 
ably, respectively. Response vari- 
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ance on MMPI scales would then be 
primarily a function of these two styl- 
istic components in various weighted 
proportions. Studies including inde- 
pendent marker variables for the two 
styles are of course required to 
identify the factor positions. Much 
research is also needed into the pre- 
cise nature of the set to respond 
desirably, particularly in view of 
three complicating results: (a) the 
finding of consistent individual differ- 
ences in judgments of desirability 
(Messick, 1960); (b) the distinction 
between personal and social desira- 
bility (Borislow, 1958; Rosen, 1956); 
and (c) the differentiation between a 


tendency to endorse certain desirable 
items which exhibit large mean shifts 
under desirability instructions and 
the tendency to endorse other desir- 
ble items which presumably reflect a 
group norm (Voas, 1958; Wiggins, 
1959). 

In conclusion, the findings offer 
clear evidence that acquiescence, as 
moderated by item desirability, plays 
a dominant role in personality inven- 
tories like the MMPI. Focused 
empirical investigations are required 
to develop a refined interpretation of 
these and other stylistic consistencies 
in terms of personality organization 
and psychopathology. 
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The recent rise of interest in the use 
of nonparametric tests stems from 
two main sources. One is the concern 
about the use of parametric tests 
when the underlying assumptions are 
not met. The other is the problem of 
whether or not the measurement scale 
is suitable for application of paramet- 
ric procedures. On both counts 
parametric tests are generally more in 
danger than nonparametric tests. 
Because of this, and because of a 
natural enthusiasm for a new tech- 
nique, there has been a sometimes 
uncritical acceptance of nonparamet- 
ric procedures. By now a certain 
degree of agreement concerning the 
more practical aspects involved in the 
choice of tests appears to have been 
reached. However, the measurement 
theoretical issue has been less clearly 
resolved. The principal purpose of 
this article is to discuss this latter 
issue further. For the sake of com- 
pleteness, a brief overview of practi- 
cal statistical considerations will also 
be included. 

A few preliminary comments are 
needed in order to circumscribe the 
subsequent discussion. In the first 
place, it is assumed throughout that 
_ the data at hand arise from some sort 
of Measuring scale which gives nu- 
merical results. This restriction is 
= implicit in the proposal to compare 
_ Parametric and nonparametric tests 


(a 
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1 An earlier version of this paper was pre- 

ited at the April 1959 meetings of the 
_ Western Psychological Association. The au- 
j _thor’s thanks are due F. N. Jones and J. B. 
E Sidowski for their helpful comments. 
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since the former do not apply to 
strictly categorical data (but see 
Cochran, 1954). Second, parametric 
tests will mean tests of significance 
which assume equinormality, i.e., 
normality and some form of homo- 
geneity of variance. For convenience, 
parametric test, F test, and analysis 
of variance will be used synony- 
mously. Although this usage is not 
strictly correct, it should be noted 
that the ¢ test and regression analysis 
may be considered as special applica- 
tions of F. Nonparametric tests will 
refer to significance tests which make 
considerably weaker distributional 
assumptions as exemplified by rank 
order tests such as the Wilcoxon F; 
the Kruskal-Wallis H, and by the 
various median-type tests. Third, the 
main focus of the article is on tests of 
significance with a lesser emphasis on 
descriptive statistics. Problems of 
estimation are touched on only 
slightly although such problems are 
becoming increasingly important. 

Finally, a word of caution is in 
order. It will be concluded that 
parametric procedures constitute the 
everyday tools of psychological sta- 
tistics, but it should be realized that 
any area of investigation has its own 
statistical peculiarities and that gen- 
eral statements must always _be 
adapted to the prevailing practical 
situation. In many cases, as in pilot 
work, for instance, or in situations in 
which data are cheap and plentiful, 
nonparametric tests, shortcut para- 
metric tests (Tate & Clelland, 1957), 
or tests by visual inspection may well 
be the most efficient. 
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PRACTICAL STATISTICAL 
CONSIDERATIONS 


The three main points of compari- 
son between parametric and non- 
parametric tests are significance level, 
power, and versatility. Most of the 
relevant considerations have been 
treated adequately by others and 
only a brief summary will be given 
here. For more detailed discussion, 
the articles of Cochran (1947), Sav- 
age (1957), Sawrey (1958), Gaito 
(1959), and Boneau (1960) are espe- 
cially recommended. 

Significance level. The effects of 
lack of equinormality on the signifi- 
cance level of parametric tests have 
received considerable study. The 
two handiest sources for the psy- 
chologist are Lindquist’s (1953) cita- 
tion of Norton’s work, and the recent 
article of Boneau (1960) which sum- 
marizes much of the earlier work. 
The main conclusion of the various 
investigators is that lack of equi- 
normality has remarkably little effect 
although two exceptions are noted: 
one-tailed tests and tests with con- 
siderably disparate cell n’s may be 
rather severely affected by unequal 
variances.* 

A somewhat different source of 
perturbation of significance level 
should also be mentioned. An over- 
all test of several conditions may 
show that something is significant 
but will not localize the effects. As is 
well known, the common practice of t 
testing pairs of means tends to in- 
flate the significance level even when 
the over-all F is significant. An 


2 The split-plot designs (e.g., Lindquist, 
1953) commonly used for the analysis of re- 
peated or correlated observations have been 
subject to some criticism (Cotton, 1959; 
Greenhouse & Geisser, 1959) because of the 
additional assumption of equal correlation 
which is made. However, tests are available 
which do not require this assumption (Cotton, 
1959; Greenhouse & Geisser, 1959; Rao, 1952), 


analogous inflation occurs with non- 
parametric tests. There are para- 
metric multiple comparison proce- 
dures which are rigorously applicable 
in many such situations (Duncan, 
1955; Federer, 1955) but analogous 
nonparametric techniques have as 
yet been developed in only a few 
cases. 

Power. As Dixon and Massey 
(1957) note, rank order tests are 
nearly as powerful as parametric 
tests under equinormality. Con- 
sequently, there would seem to be no 
pressing reason in most investiga- 
tions to use parametric techniques 
for reasons of power if an appropriate 
rank order test is available (but see 
Snedecor, 1956, p. 120). Of course, 
the loss of power involved in dichoto- 
mizing the data for a median-type 
test is considerable. 

Although it might thus be argued 
that rank order tests should be gen- 
erally used where applicable, it is to 
be suspected that such a practice 
would produce negative transfer to 
the use of the more incisive experi- 
mental designs which need para- 
metric analyses. The logic and com- 
puting rules for the analysis of vari- 
ance, however, follow a uniform pat- 
tern in all situations and thus provide 
maximal positive transfer from the 
simple to the more complex experi- 
ments. 

There is also another aspect of 
power which needs mention. Not in- 
frequently, it is possible to use exist- 
ing data to get a rough idea of the 
chances of success in a further related 
experiment, or to estimate the N re- 
quired for a given desired probability 
of success (Dixon & Massey, 1957, 
Ch. 14). Routine methods are avail- 
able for these purposes when para- 
metric statistics are employed but 
similar procedures are available only 
for certain nonparametric tests such 
as chi square. 
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Versatility. One of the most re- 
markable features of the analysis of 
variance is the breadth of its ap- 
plicability, a point which has been 
emphasized by Gaito (1959). For 
present purposes, the ordinary fac- 
torial design will serve to exemplify 
the issue. Although factorial designs 
are widely employed, their uses in the 
investigation and control of minor 
variables have not been fully ex- 
ploited. Thus, Feldt (1958) has 
noted the general superiority of the 
factorial design in matching or equat- 
ing groups, an important problem 
which is but poorly handled in cur- 
rent research (Anderson, 1959). 
Similarly, the use of replications as a 
factor in the design makes it possible 
to test and partially control for drift 
or shift in apparatus, procedure, or 
subject population during the course 
of an experiment. In the same way, 
taking experimenters or stimulus ma- 
terials as a factor allows tests which 
bear on the adequacy of standardiza- 
tion of the experimental procedures 
and on the generalizability of the re- 
sults. 

„An analogous argument could be 
given for latin squares, largely re- 
habilitated by the work of Wilk and 
Kempthorne (1955), which are useful 
when subjects are given successive 
treatments; for orthogonal poly- 
nomials and trend tests for corre- 
lated scores (Grant, 1956) which give 
the most sensitive tests when the in- 
dependent variable is scaled; as well 
as for the multivariate analysis of 
variance (Rao, 1952) which is appli- 
cable to correlated dependent vari- 
ables measured on incommensurable 
scales, 

The point to these examples and to 
the more extensive treatment by 
Gaito is straightforward. Their anal- 
ysis is more or less routine when 
Parametric procedures are used. 

Owever, they are handled inade- 


307 


quately or not at all by current non- 
parametric methods. 

It thus seems fair to conclude that 
parametric tests constitute the stand- 
ard tools of psychological statistics. 
In respect of significance level and 
power, one might claim a fairly even 
match. However, the versatility of 
parametric procedures is quite un- 
matched and this is decisive. Unless 
and until nonparametric tests are de- 
veloped to the point where they meet 
the routine needs of the researcher as 
exemplified by the above designs, 
they cannot realistically be con- 
sidered as competitors to parametric 
tests. Until that day, nonparametric 
tests may best be considered as use- 
ful minor techniques in the analysis 
of numerical data. 

Too promiscuous a use of F is, of 
course, not to be condoned since there 
will be many situations in which the 
data are distributed quite wildly. 
Although there is no easy rule with 
which to draw the line, a frame of 
reference can be developed by study- 
ing the results of Norton (Linquist, 
1953) and of Boneau (1960). It is 
also quite instructive to compare p 
values for parametric and nonpara- 
metric tests of the same data. 

It may be worth noting that one of 
the reasons for the popularity of non- 
parametric tests is probably the cur- 
rent obsession with questions of sta- 
tistical significance to the neglect of 
the often more important questions of 
design and power. Certainly some 
minimal degree of reliability is gen- 
erally a necessary justification for 
asking others to spend time in assess- 
ing the importance of one’s data. 
However, the question of statistical 
significance is only a first step, and a 
relatively minor one at that, in the 
over-all process of evaluating a set of 
results. To say that a result is sta- 
tistically significant simply gives 
reasonable ground for believing that 
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some nonchance effect was obtained. 
The meaning of a nonchance effect 
rests on an assessment of the design of 
the investigation. Even with judi- 
cious design, however, phenomena 
are seldom pinned down in a single 
study so that the question of replica- 
bility in further work often arises 
also. The statistical aspects of these 
these two questions are not without 
importance but tend to be neglected 
when too heavy an emphasis is 
placed on p values. As has been 
noted, it is the parametric procedures 
which are the more useful in both re- 
spects. 


MEASUREMENT SCALE 
CONSIDERATIONS 


The second and principal part of 
the article is concerned with the rela- 
tions between types of measurement 
scales and statistical tests. For con- 
venience, therefore, it will be as- 
sumed that lack of equinormality 
presents no serious problem. Since 
the F ratio remains constant with 
changes in unit or zero point of the 
measuring scale, we may ignore ratio 
scales and consider only ordinal and 
interval scales. These scales are de- 
fined following Stevens (1951). 
Briefly, an ordinal scale is one in 
which the events measured are, in 
some empirical sense, ordered in the 
same way as the arithmetic order of 
the numbers assigned to them. An 
interval scale has, in addition, an 
equality of unit over different parts 
of the scale. Stevens goes on to char- 
acterize scale types in terms of 
permissible transformations. Foran 
ordinal scale, the permissible trans- 
formations are monotone since they 
leave rank order unchanged. For an 
interval scale, only the linear trans- 
formations are permissible since 

only these leave relative distance 
unchanged. Some workers (e.g., 
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Coombs, 1952) have considered vari- 
ous scales which lie between the or- 
dinal and interval scales. However, 
it will not be necessary to take this 
further refinement of the scale typol- 
ogy into account here. 

As before, we suppose that we have 
a measuring scale which assigns num- 
bers to events of a certain class. It is 
assumed that this measuring scale is 
an ordinal scale but not necessarily 
an interval scale. In order to fix 
ideas, consider the following example. 
Suppose that we are interested in 
studying attitude toward the church. 
Subjects are randomly assigned to 
two groups, one of which, reads Com- 
munication A, while the other reads 
Communication B. The subjects’ 
attitudes towards the church are 
then measured by asking them to 
check a seven category pro-con rating 
scale. Our problem is whether the 
data give adequate reason to con- 
clude that the two communications 
had different effects. 

To ascertain whether the com- 
munications had different effects, 
some statistical test must be ap- 
plied. In some cases, to be sure, the 
effects may be so strong that the test 
can be made by inspection. In most 
cases, however, some more objective 
method is necessary. An obvious 
procedure would be to assign the 
numbers 1 to 7, say, to the rating 
scale categories and apply the F test, 
at least if the data presented some 
semblance of equinormality. How- 
ever, some writers on statistics (e.g. 
Siegel, 1956; Senders, 1958) would 
object to this on the ground that the 
rating scale is only an ordinal scale, 
the data are therefore not ‘‘truly 
numerical,” and hence that the 
operations of addition and multipli- 
cation which are used in computing F 
cannot meaningfully be applied to 
the scores. There are three different 
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questions involved in this objection, 
and much of the controversy over 
scales and statistics has arisen from 
a failure to keep them separate. Ac- 
cordingly, these three questions will 
be taken up in turn. 

Question 1. Can the F test be ap- 

plied to data from an ordinal scale? It 
is convenient to consider two cases of 
this question according as the as- 
sumption of equinormality is satis- 
fied or not. Suppose first that equi- 
normality obtains. The caveat 
against parametric statistics has been 
stated most explicitly by Siegel 
(1956) who says: 
The conditions which must be satisfied . . . 
before any confidence can be placed in any 
probability statement obtained by the use of 
the ż test are at least these: . . . 4. The vari- 
ables involved must have been measured in at 
least an interval scale . . . (p. 19). (By permis- 
sion, from Nonparametric Statistics, by 
S. Siegel. Copyright, 1956. McGraw-Hill 
Book Company, Inc.) 

This statement of Siegel’s is com- 
pletely incorrect. This particular 
question admits of no doubt whatso- 
ever. The F (or t) test may be 
applied without qualm. It will then 
answer the question which it was de- 
signed to answer: can we reasonably 
conclude that the difference between 
the means of the two groups is real 
rather than due to chance? The 
justification for using F is purely 
statistical and quite  straightfor- 
ward; there is no need to waste space 
on it here. The reader who has 
doubts on the matter should postpone 
them to the discussion of the two 
subsequent questions, or read the 
elegant and entertaining article by 
Lord (1953). As Lord points out, 
the statistical test can hardly be 
cognizant of the empirical meaning of 
the numbers with which it deals. 
Consequently, the validity of a sta- 
tistical inference cannot depend on 
the type of measuring scale used. 


The case in which equinormality 
does not hold remains to be consid- 
ered. We may still use F, of course, 
and as has been seen in the first part, 
we would still have about the same 
significance level in most cases. The 
F test might have less power than a 
rank order test so that the latter 
might be preferable in this simple two 
group experiment. However, insofar 
as we wish to inquire into the relia- 
bility of the difference between the 
measured behavior of the two groups 
in our particular experiment, the 
choice of statistical test would be 
governed by purely statistical consid- 
erations and have nothing to do with 
scale type. 

Question 2. Will statistical results be 
invariant under change of scale? The 
problem of invariance of result stems 
from the work of Stevens (1951) who 
observes that a statistic computed on 
data from a given scale will be invari- 
ant when the scale is changed accord- 
ing to any given permissible transfor- 
mation. It is important to be precise 
about this usage of invariance. It 
means that if a statistic is computed 
from a set of scale values and this 
statistic is then transformed, the 
identical result will be obtained as 
when the separate scale values are 
transformed and the statistic is com- 
puted from these transformed scale 
values. 

Now our scale of attitude toward 
the church is admittedly only an 
ordinal scale. Consequently, we 
would expect it to change 1n the 
direction of an interval scale in future 
work. Any such scale change would 
correspond to a monotone transfor- 
mation of our original scale since only 
such transformations are permissible 
with an ordinal scale. Suppose then 
that a monotone transformation of 
the scale has been made subsequent 
to the experiment on attitude change. 
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We would then have two sets of data: 
the responses as measured on the 
original scale used in the experiment, 
and the transformed values of these 
responses as measured on the new, 
transformed scale. (Presumably, 
these transformed scale values would 
be the same as the subjects would 
have made had the new scale been 
used in the original experiment, al- 
though this will no doubt depend on 
the experimental basis of the new 
scale.) The question at issue then 
becomes whether the same signifi- 
cance results will be obtained from 
the two sets of data. If rank order 
tests are used, the same significance 
results will be found in either case 
because any permissible transforma- 
tion leaves rank order unchanged. 
However, if parametric tests are 
employed, then different significance 
statements may be obtained from the 
two sets of data. It is possible to get a 
significant F from the original data 
and not from the transformed data, 
and vice versa. Worse yet, it is even 
logically possible that the means of 
the two groups will lie in reverse order 
on the two scales. 

The state of affairs just described is 
clearly undesirable. If taken uncriti- 
cally, it would constitute a strong 
argument for using only rank order 
tests on ordinal scale data and re- 
stricting the use of F to data obtained 
from interval scales. It is the purpose 
of this section to show that this con- 
clusion is unwarranted. The basis of 
the argument is that the naming of 
the scales has begged the psychologi- 
cal question. 

Consider interval scales first, and 
imagine that two students, P and Q, 
in an elementary lab course are as- 
signed to investigate some process. 
This process might be a ball rolling on 
a plane, a rat running an alley, or a 

child doing sums. The students 
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cooperate in the experimental work, 
making the same observations, except 
that they use different measuring 
scales. P decides to measure time 
intervals. He reasons that it makes 
sense to speak of one time interval as 
being twice another, that time inter- 
vals therefore form a ratio scale, and 
hence a fortiori an interval scale. Q 
decides to measure the speed of the 
process (feet per second, problems per 
minute). By the same reasoning as 
used by P, Q concludes that he has an 
interval scale also. Both P and Q are 
aware of current strictures about 


TIME 


SPEED 


Fic. 1. Temporal aspects of some process 
obtained from a 2X2 design. (The data are 
plotted as a function of Variable A with Vari- 
able B as a parameter. Subscripts denote the 
two levels of each variable. Note that Panel 
P sai an interaction, but that Panel Q does 
not. 
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scales and statistics. However, since 
each believes (and rightly so) that he 


has an interval scale, each uses means 
and applies parametric tests in writ- 
ing his lab report. Nevertheless, 
when they compare their reports they 


find considerable difference in their 
descriptive statistics and graphs (Fig- 
ure 1), and in their F ratios as well. 
Consultation with a statistican shows 
that these differences are direct con- 
sequences of the difference in the 
measuring scales. Evidently then, 
possession of an interval scale does 
not guarantee invariance of interval 
scale statistics. 

For ordinal scales, we would expect 
to obtain invariance of result by using 
ordinal scale statistics such as the 
median (Stevens, 1951). Let us sup- 
pose that some future investigator 
finds that attitude toward the church 
is multidimensional in nature and 
has, in fact, obtained interval scales 
for each of the dimensions. In some 
of his work he chanced to use our 
original ordinal scale so that he was 
able to find the relation between this 
ordinal scale and the multidimen- 
sional representation of the attitude. 
His results are shown in Figure 2. 
Our ordinal scale is represented by 
the curved line in the plane of the two 
dimensions. Thus, a greater distance 
from the origin as measured along the 
line stands for a higher value on our 
ordinal scale. Points A and B on the 
curve represent the medians of Groups 
A and B in our experiment, and it is 
seen that Group A is more pro-church 
than Group B on our ordinal scale. 
The median scores for these two 
groups on the two dimensions are 
obtained simply by projecting Points 
A and B onto the two dimensions. All 
is well on Dimension 2 since there 
Group A is greater than Group B. On 
Dimension 1, however, a reversal is 
found: Group A is less than Group B, 


DIMENSION 2 


Fic. 2. The curved line represents the ordi- 
nal scale of attitude toward the church plotted 
in the two-dimensional u the 
attitude. (Points A and B denote the medians 
of two experimental groups. The graph is 
hypothetical, of course.) 


contrary to our ordinal scale results. 
Evidently then, possession of an 
ordinal scale does not guarantee 
invariance of ordinal scale statistics. 
A rather more drastic loss of invari- 
ance would occur if the ordinal scale 
were measuring the resultant effect of 
two or more underlying processes. 
This could happen, for instance, in 
the study of approach-avoidance 
conflict, or ambivalent behavior, as 
might be the case with attitude 
toward the church. In such situa- 
tions, two people could give identical 
responseson theone-dimensional scale 
and yet be quite different as r 
the two underlying processes. For 
instance, the same resultant could 


senting such data in the space formed 
by the underlying dimensions would 
yield a smear of points over an entire 
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region rather than a simple curve as 
in Figure 2. 

Although it may be reasonable to 
think that simple sensory phenomena 
are one-dimensional, it would seem 
that a considerable number of psy- 
chological variables must be con- 
ceived of as multidimensional in 
nature as, for instance, with “IQ” 
and other personality variables. Ac- 
cordingly, as the two cited examples 
show, there is no logical guarantee 
that the use of ordinal scale statistics 
will yield invariant results under scale 
changes. 

It is simple to construct analogous 
examples for nominal scales. How- 
ever, their only relevance would be to 
show that a reduction of all results to 
categorical data does not avoid the 
difficulty with invariance. 

It will be objected, of course, that 
the argument of the examples has 
violated the initial assumption that 
only ‘‘permissible’’ transformations 
would be used in changing the meas- 
uring scales. Thus, speed and time 
are not linearly related, but rather 
the one is a reciprocal transformation 
of the other. Similarly, Dimension 1 
of Figure 2 is no monotone transfor- 
mation of the original ordinal scale. 
This objection is correct, to be sure, 
but it simply shows that the problem 
of invariance of result with which one 
is actually faced in science has no 
particular connection with the invari- 
ance of “permissible” statistics. The 
examples which have been cited show 
that knowing the scale type, as deter- 
mined by the commonly accepted 
criteria, does not imply that future 
scales measuring the same phenom- 
ena will be “permissible” transfor- 
mations of the original scale. Hence 
the use of “permissible” statistics, 
although guaranteeing invariance of 

result over the class of “permissible” 
transformations, says little about 
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invariance of result over the class of 
scale changes which must actually be 
considered by the investigator in his 
work. 

This point is no doubt pretty obvi- 
ous, and it should not be thought that 
those who have taken up the scale- 
type ideas are unaware of the prob- 
lem. Stevens, at least, seems to ap- 
preciate the difficulty when, in the 
concluding section of his 1951 article, 
he distinguishes between psychologi- 
cal dimensions and indicants. The 
former may be considered as inter- 
vening variables whereas the latter 
are effects or correlates of these vari- 
ables. However, it is evident that an 
indicant may be an interval scale in 
the customary sense and yet bear a 
complicated relation to the underly- 
ing psychological dimensions. In such 
cases, no procedure of descriptive or 
inferential statistics can guarantee in- 
variance over the classof scale changes 
which may become necessary. 

It should also be realized that only 
a partial list of practical problems of 
invariance has been considered. Ef- 
fects on invariance of improvements 
in experimental technique would also 
have to be taken into account since 
such improvements would be ex- 
pected to purify or change the de- 
pendent variable as well as decrease 
variability. There is, in addition, a 
problem of invariance over subject 
population. Most researches are 
based on some handy sample of sub- 
jects and leave more or less doubt 
about the generality of the results. 
Although this becomes in large part 
an extrastatistical problem (Wilk & 
Kempthorne, 1955), it is one which 
assumes added importance in view of 
Cronbach’s (1957) emphasis on the 
interaction of experimental and sub- 
ject variables. In the face of these 
assorted difficulties, it is not easy to 
see what utility the scale typology 
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has for the practical problems of the 
investigator. 

The preceding remarks have been 
intended to put into broader perspec- 
tive that sort of invariance which is 
involved in the use of permissible 
statistics. They do not, however, 
solve the immediate problem of 
whether to use rank order tests or F 
in case only permissible transforma- 
tions need be considered. Although 
invariance under permissible scale 
transformations may be of relatively 
minor importance, there is no point in 
taking unnecessary risks without the 
possibility of compensation. 

On this basis, one would perhaps 
expect to find the greatest use of rank 
order tests in the initial stages of 
inquiry since it is then that measuring 
scales will be poorest. However, it is 
in these initial stages that the possi- 
bly relevant variables are not well- 
known so that the stronger experi- 
mental designs, and hence paramet- 
ric procedures, are most needed. 
Thus, it may well be most efficient to 
use parametric tests, balancing any 
risk due to possible permissible scale 
changes against the greater power 
and versatility of such tests. In the 
later stages of investigation, we 
would be generally more sure of the 
scales and the use of rank order proce- 
dures would waste information which 
the scales by then embody. 

At the same time, it should be 
realized that even with a relatively 
crude scale such as the rating scale of 
attitude toward the church, the 
possible permissible transformations 
which are relevant to the present 
discussion are somewhat restricted. 
Since the F ratio is invariant under 
change of zero and unit, it is no re- 
striction to assume that any trans- 
formed scale also runs from 1 to 7. 
This imposes a considerable limita- 
tion on the permissible scale transfor- 
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mations which must be considered. 
In addition, whatever psychological 
worth the original rating scale pos- 
sesses will limit still further the trans- 
formations which will occur in prac- 
tice. 

Although rank order tests do pos- 
sess some logical advantage over 
parametric tests when only permissi- 
ble transformations are considered, 
this advantage is, in the writer's 
opinion, very slight in practice and 
does not begin to balance the greater 
versatility of parametric procedures. 
The problem is, however, an empiri- 
cal one and it would seem that some 
historical analysis is needed to pro- 
vide an objective frame of reference. 
To quote an after-lunch remark of K. 
MacCorquodale, “Measurement the- 
ory should be descriptive, not pro- 
scriptive, nor prescriptive.” Such an 
inquiry could not fail to be fascinat- 
ing because of the light it would 
throw on the actual progress of 
measurement in psychology. One 
investigation of this sort would prob- 
ably be more useful than all the 
speculation which has been written 
on the topic of measurement. 

Question 3. Will the use of paramet- 
ric as opposed to nonparametric 
statistics affect inferences about under- 
lying psychological processes? Ina 
narrow sense, Question 3 is irrelevant 
to this article since the inferences in 
question are substantive, relating to 
psychological meaning, rather than 
formal, relating to data reliability. 
Nevertheless, it is appropriate to 
discuss the matter briefly in order to 
make explicit some of the considera- 
tions involved because they are often 
confused with problems arising under 
the two previous questions. With no 
pretense of covering all aspects of this 
question, the following two examples 
will at least touch some of the prob- 


lems. 
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The first example concerns the two 
students, P and Q, mentioned above, 
who had used time and speed as 
dependent variables. We suppose 
that their experiment was based on a 
2X2 design and yielded means as 
plotted in Figure 1. This graph por- 
trays main effects of both variables 
which are seen to be similar in na- 
ture in both panels. However, our 
principal concern is with the inter- 
action which may be visualized as 
measuring the degree of nonparallel- 
ism of the two lines in either panel. 
Panel P shows an interaction. The 
reciprocals of these same data, plotted 
in Panel Q, show no interaction. It is 
thus evident in the example, and true 
in general, that interaction effects 
will depend strongly on the measur- 
ing scales used. 

Assessing an interaction does not 
always cause trouble, of course. Had 
the lines in Panel P, say, crossed each 
other, it would not be likely that any 
change of scale would yield uncrossed 
=a In many ary also, the scale 

is sufficient for the purposes at 
hand and future scale changes need 
not be considered. Nevertheless, it is 
clear that a measure of caution will 
often be needed in making inferences 
from interaction to psychological 
process. If the investigator envisages 
the possibility of future changes in 
the scale, he should also realize that a 
present inference based on significant 
interaction may lose credibility in the 
light of the rescaled data. 

It is certainly true that the inter- 
pretation of interactions has some- 
times led to error. It may also be 
noted that the usual factorial design 
analysis is sometimes incongruent 
with the phenomena. In a 2X2 de- 
sign it might happen, for example, 
that three of the four cell means are 
equal. The usual analysis is not 
optimally sensitive to this one real 
difference since it is distributed over 
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three degrees of freedom. In such 
cases, there will often be other para- 
metric tests involving specific com- 
parisons (Snedecor, 1956) or multiple 
comparisons (Ducan, 1955) which are 
more appropriate. Occasionally also, 
an analysis of variance based on a 
multiplicative model (Williams, 1952) 
will be useful (Jones & Marcus, 1961). 
A judicious choice of test may be of 
great help in dissecting the results. 
However, the test only answers set 
questions concerning the reliability of 
the results; only the research worker 
can say which questions are appropri- 
ate and meaningful. 

Inferences based on nonparamet- 
ric tests of interaction would pre- 
sumably be less sensitive to certain 
types of scale changes. However, 
caution would still be needed in the 
interpretation as has been seen in 
Question 2. The problem is largely 
academic, however, since few non- 
parametric tests of interaction exist.’ 
It might be suggested that the ques- 
tion of interaction cannot arise when 
only the ordinal properties of the 
data are considered since the interac- 
tion involves a comparison of differ- 
ences and such a comparison is illegit- 
imate with ordinal data. To the 
extent that this suggestion is correct, 
a parametric test can be used to the 
same purposes equally well if not 
better; to the extent that it is not cor- 
rect, nonparametric tests will waste 
information. 

One final comment on the first 
example deserves emphasis. Since 
both time and speed are interval 


scales, it cannot be argued that the 


* There is a nomenclatural difficulty here. 
Strictly speaking, nonparametric tests should 
be called more-or-less distribution free tests. 
For example, the Mood-Brown generalized 
median test (Mood, 1950) is distribution free, — 
but is based on a parametric model of the same 
sort as in the analysis of variance. As noted 
in the introduction, the usual terminology is 
used in this article. 1 
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difficulty in interpretation arises be- 
cause we had only ordinal scales. 


The second example, suggested by 
J. Kaswan, is shown in Figure 3. The 
graph, which is hypothetical, plots 
amount of aggressiveness as a func- 
tion of amount of stress. A glance at 
the graph leads immediately to the 


inference that some sort of threshold 
effect is present. Under increasing 
stress, the organism remains quies- 
cent until the stress passes a certain 
threshold value, whereupon the or- 
ganism leaps into full scale aggressive 
behavior. 

Confidence in this interpretation is 
shaken when we stop to consider that 
the scales for stress and aggression 
may not be very good. Perhaps, 
when future work has given us im- 
proved scales, these same data would 
yield a quite different function such 
as a straight line. 

One extreme position regarding the 
threshold effect would be to say that 
the scales give rank order information 
and no more. The threshold infer- 
ence, or any inference based on char- 
acteristics of the curve shape other 
than the uniform upward trend, 
would then be completely disallowed. 
At the other extreme, there would be 
complete faith in the scales and all 
inferences based on curve shape, 
including the threshold effect, would 
be made without fear that they would 
be undermined by future changes in 
the scales. In practice, one would 
probably adopt a position between 
these two extremes, believing, with 
Mosteller (1958), that our scales 
generally have some degree of numer- 
ical information worked into them, 
and realizing that to consider only the 
rank order character of the data 
would be to ignore the information 
that gives the strongest hold on the 
behavior. 

_ From thisill-defined middleground, 
inferences such as the threshold effect 


STRESS 


Fic. 3. Aggressiveness plotted as a function 
of stress. (The curve is hypothetical. Note 
the hypothetical threshold effect.) 


would be entertained as guides to 
future work, Such inferences, how- 
ever, are made at the judgment of the 
investigator. Statistical techniques 
may be helpful in evaluating the 
reliability of high? features A = 
data, but only the investigator ¢ 
endow them with psychological 
meaning. 


situations in which the dependent 
variable is maoa ; , thus excluding 
ictly ca i ata. 
wey eter ag i 
was noted that the difference between 
parametric and rank order tests was 
not great insofar as significance level 
and power were concerned. However, 
only the versatility of parametric 
statistics meets the everyday needs of 
psychological research. It was con- 
cluded that parametric procedures 
are the standard tools of psychological 
statistics although nonparametric 
procedures are useful minor tech- 


UND the heading of measurement 
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theoretical considerations, three ques- 
tions were distinguished. The well- 
known fact that an interval scale is 
not prerequisite to making a statisti- 
cal inference based on a parametric 
test was first pointed out. The second 
question took up the important 
problem of invariance. It was noted 
that the practical problems of invari- 
ance or generality of result far trans- 
cend measurement scale typology. In 
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addition, the cited example of time 
and speed showed that interval scales 
of a given phenomenon are not 
unique. The discussion of the third 
question noted that the problem of 
psychological meaning is not basi- 
cally a statistical matter. It was thus 
concluded that the type of measuring 
scale used had little relevance to the 
question of whether to use paramet- 
ric or nonparametric tests. 
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BASIC FORMS OF COVARIATION AND 
CONCOMITANCE DESIGNS 


RICHARD W. COAN 
University of Arizona 


Several years ago, Cattell (1946) 
published a description of what he 
called the “covariation chart,” a 
graphic model which illustrates six 
basic forms of covariation with which 
we may deal in psychological re- 
search. It is the purpose of the 
present paper to describe an exten- 
sion and modification of Cattell’s 
schema that will provide much more 
comprehensive classification of actual 
and possible research designs in 
psychology. 

The six forms of covariation en- 
compassed by Cattell’s model have 
been variously labeled with letters 
through the alphabetic range from M 
to T, but the labeling indicated in 
Table 1 has come to be reasonably 
standard. The covariation chart 
itself consists of a parallelepiped, in 
which the three dimensions represent 
tests, persons, and occasions. Any 
plane parallel to any surface of the 
model represents a score matrix 
which might correspond to the data 
from a psychological research. There 
are three such sets of planes, any one 


TABLE 1 


Tue Sıx Basic Forms oF COVARIATION 
INDICATED IN THE COVARIATION CHART 


A Variables 
Tech- Variables Series over held con- 
nique correlated _ Which stant or 
correlated singular 
eR ee 
R Tests Persons Occasions 
Q Persons Tests Occasions 
P Tests Occasions Persons 
O Occasions Tests Persons 
S Persons Occasions Tests 
T Occasions Persons Tests 


ete ee 


plane permitting consideration of two 
kinds of covariation. 

The major virtue of a classification 
scheme like that embodied in the 
covariation chart is that it can sug- 
gest forms of valuable research which 
might otherwise be overlooked. As 
Cattell himself has clearly recog- 
nized, however, the scope of the 
covariation chart model has certain 
unfortunate limitations. When he 
first presented the covariation chart, 
Cattell pointed out that the six tech- 
niques did not really exhaust the 
forms of covariation inherently de- 
rivable from the three-dimensional 
model. The other forms which he 
considered at that time, however, are 
essentially variants or compounds of 
the six basic forms of covariation. 

Various more novel techniques will 
emerge, of course, if we can find 
justification for adding other dimen- 
sions to the model. In a more recent 
publication, Cattell (1957) points out 
that a psychological event may be 
characterized in terms of six inde- 
pendent “tags”: a reacting organism, 
a focal stimulus, a background condi- 
tion, a response, an occasion in time 
and space, and an observer. He sug- 
gests that any pair of tags may serve 
as the dimensions of a score matrix 
yielding a technique and its trans- 
pose. Since there are 15 possible pairs 
of tags, there are 15 possible tech- 
niques (and their corresponding 
transposes). Furthermore, the ele- 
ments within any matrix could corre- 
spond to any of the six tags. Logi- 
cally, this would extend the system to 
90 possible techniques (or 180, includ- 
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ing transposed techniques). Cattell 
apparently excludes some combina- 
tions and speaks of 45 possible tech- 
niques. To these he adds five addi- 
tional possibilities that involve a 
mixture of tags along one axis of the 
score matrix. 

In the view of the writer, the 
original covariation chart provides 
too limited a classification system. 
The extended model, however, intro- 
duces needless complexity and is 
subject to useful modification and 
simplification. Cattell’s six tags 
represent six distinguishable aspects 
of any observed psychological event, 
but they do not, on that account, 
constitute six meaningfully distin- 
guishable aspects of research design. 

The distinction between focal stim- 
ulus and background condition is a 
somewhat arbitrary one, and its use- 
fulness in design classification is 
questionable. We can nearly always 
isolate a great variety of stimulus 
variables that will influence a given 
event in a more or less direct way. 
Insofar as the researcher analyzes the 
effect of one of these variables, it 
becomes a focal stimulus variable, at 
least from the standpoint of the 
researcher and hence of the research 
design. Background conditions are 
otherwise irrelevant to experimental 
design, unless they are confounded 
with other kinds of variables (organ- 
isms or occasions). 

The observer is also a vital part of 
any psychological event dealt with in 
research, but the observer becomes 
important as a component of design 
only to the extent that he is some- 
thing more than an observer. If his 
presence in the situation affects the 
behavior of the subject of the experi- 
ment, the observer becomes to that 
extent a part of the stimulus situation 
and may be analyzed accordingly. If 
our interest, on the other hand, is in 
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peculiarities of the observer as a re- 
corder or rater of behavior, we are to 
that extent treating him as a reacting 
organism, i.e., as the subject of an ex- 
periment superimposed on another 
experiment. 


Basic COMPONENTS OF DESIGN 
IN PSYCHOLOGICAL RESEARCH 


It is possible to characterize a 
psychological event in terms of a 
great number of distinguishable fea- 
tures which set it apart from other 
psychological events, but there are 
basically only four such features that 
constitute essential and distinguish- 
able parameters of any research de- 
sign employed to study psychological 
events. We shall refer to these fea- 
tures henceforth as design components 
and label them R, S, P, and O (not to 
be confused with T echniques R, S, P, 
and 0). 

Design Component R is that realm 
of variables which consists of struc- 
tural or functional manifestations on 
the part of the subject or subjects 
under investigation and which are 
studied through observation and 
measurement of the subject or of 
products of the subject’s behavior. 
Commonly treated as single R-com- 
ponent variables are specific re- 
sponses, score summaries of patterns 
or sets of responses, and attributes. 
Design Component S is that realm of 
variables which arises from sources 
outside the subject and which may be 
expected to influence the subject's 
behavior. S, then, refers to external 
stimuli. Those things which are 
sometimes called “internal stimuli” 
fall within the scope of R-component 
variables if they are directly ob- 
served or measured. The P com- 
ponent is that of the human or ani- 
mal subjects observed in the experi- 
ment. The O component is the 
realm of occasions, in given time and 
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space, on which experimental ob- 
servations are made. 

These four components are ordi- 
narily quite distinct from one another 
and subject to separate specification. 
For some purposes, we may artifi- 
cially tie variables of one component 
to those of another. In such cases, we 
may speak of a “confounding” of de- 
sign components. Confounding is 
most common with respect to Com- 
ponent O, which for various purposes 
we permit to vary systematically 
with certain S, P, or R variables. A 
confounding of S and P variables is 
also quite common. 

In a sense, any variable that we 
observe and describe may be said to 
be measured, at least implicitly, for 
if our description contains only an 
identifying qualitative statement, we 
have provided the essential ingredi- 
ents of nominal scaling. Since the 
variables of all four design compo- 
nents are subject to observation and 
description in a psychological experi- 
ment, they may be regarded as sub- 
jected simultaneously and independ- 
ently to measurement and scaling. 
Within any component, variables 
may be scaled at any level—nominal, 
ordinal, interval, or ratio—and are 
sometimes simultaneously scaled at 
more than one level. 

A peculiarity of Component P that 
should be noted is that data within it 
are usually treated as scaled either at 
the nominal or at the ratio level. So 
long as we are concerned merely with 
identifying individuals as distin- 
guishable entities, we make only the 
assumptions of nominal scaling. 
When we treat individuals as equiv- 
alent units that can be added to- 
Sether, however, and express P- 
component data in terms of numbers 
of cases or proportions of a total sam- 
ple of subjects, we have made the 
essential assumptions underlying 
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ratio scaling. The data could be ex- 
pressed in ordinal form if the label 
identifying the individual assumed 
the form of an index of rank within 
a social hierarchy. We could trans- 
form the data from ordinal form to 
presumably interval form either by 
making certain parametric assump- 
tions or by adopting some appro- 
priate measure of discriminability of 
adjacent ranks as an index of interval 
size. (Numerical data within the 
realm of Component P may assume 
any form consistent with the notion 
of measurement in terms of individuals. 
The application of measurement to 
individuals, however, yields R-com- 
ponent data.) 


AN EXTENDED COVARIATION 
DESIGN CLASSIFICATION 


A consideration of the role played 
by variables of the four design com- 
ponents in the covariation chart re- 
veals that R-component variables 
are consistently assigned to the cells 
within the score matrices correspond- 
ing to Techniques R, Q, P, O, S, and 
T. The numbers in the body of a 
score matrix represent what we con- 
ceive of as the dependent variable in 
an experiment. In psychological re- 
search, the dependent variable is 
customarily, but not inevitably, the 
response variable. While our interest 
may lie in finding what sort of re- 
sponse will appear in a given situa- 
tion, we may seek, with equal justi- 
fication, to determine which individ- 
ual will give a particular response, 
which stimulus will evoke the re- 
sponse, or on what occasion the re- 
sponse will appear. If we thus permit 
any of the four design components to 
furnish the elements within the score 
matrix, we are led to the system of 
24 techniques shown in Table 2. 

It may be noted that no component 
appears twice in any row of Table 2. 
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TABLE 2 
AN EXTENDED SYSTEM or Co- 
VARIATION DESIGNS 
————————E~~EEEE 
Varia- Series 


Varia- blein over _ Con- 
Tech- bles which which Stant or 
nique corre- varia- covaria- “"8U- 
lated tionis tionis ‘4 Var- 
noted studied ible 
P 


NEMS SGU ASTRA HOARY UOvO® 
WAHAYRXHRONDORHDOVOR THOVONH 
SOSCSSCOWWWWYWWAHUHHERRAR RD 
AYA RARODORHYVORORYWVONOY 

RAHAT RHKNOODAYTOOUHAWYOS 


Note.—The letters in the second, third, 
fourth, and fifth columns refer to the design 
components from which variables are drawn. 


This classification system assumes 
that the two axes of the matrix and 
the elements within the matrix will 
generally represent three different de- 
sign components. Supporting this 
assumption is the fact that each de- 
sign component represents variables 
which are an integral part of any 
psychological event, and the ques- 
tions raised in psychological research 
normally refer to the manner in 
which variables of the different 
realms represented by the four com- 
ponents converge in a given psycho- 
logical event. It must be granted, 
however, that our assumption is, in 
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some respects, an arbitrary one. It js 
possible to conceive of designs in 
which the axes and the matrix ele- 
ments would not represent three dif- 
ferent components, but such designs 
can also be rationalized quite readily 
as variants of techniques already in’ 
thesystem. Whether the classification 
system proposed here will generally 
provide the most convenient frame- 
work for design conceptualization 
must ultimately be determined 
through practical application. In 
any case, a classification system of 
this sort cannot be exhaustive if it is 
to remain fairly simple. It can merely 
provide a framework of basic proto- 
typal techniques. Some designs will 
inevitably appear as combinations or 
variants of these techniques. 

It must be emphasized that these 
techniques refer to research designs 
in which covariation is to be’ ob- 
served, but they do not imply any 
particular form of statistical analysis. 
In general, the desired indices of co- 
variation will be furnished by corre- 


lational methods. Whether a method ~ 


such as factor analysis or cluster anal- 
ysis will be applied subsequently is 
an additional consideration. 


COVARIATION DESIGN AND 
ConcoMITANCE DESIGN 


If we are interested in truly com- 
prehensive classification of psycho- 
logical research designs, we must 
recognize at the outset that most 
psychological experiments are not 
actually concerned with covariation. 
The simplest form of research would 
call for a single measurement. This 
measurement might fall within the 
realm of any of our four design com- 
ponents, and it could be thought of as 
the single element filling a single-cell 
matrix. The variables of the other 
three components would also be 
singular, 

More commonly we speak of ig; 
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search design when we seek data for a 
matrix of at least two cells and where 
we are interested in a relationship 
among the ingredients of the matrix. 
The relationship may nearly always 
be considered in terms of a concomit- 


* ance of two or more elements falling 


within the realm of one of our design 
components, and these elements are 
related in terms of their convergence 
with elements corresponding to a 
different component. If we represent 
all variables or elements of a common 
design component along a common 
axis of a score matrix, the data of 
many experiments must be thought 
of as filling cells arranged serially in a 
single row or column. We relate 
either the single cell rows of a single 
column matrix or the single cell 
columns of a single row matrix. 
The kind of matrix we are now de- 


gscribing is a truncated version of the 


kind we assumed in classifying 
covariation designs. We can speak 
meaningfully of concomitance with 
respect to two single cell rows, but 
not of covariation, for this assumes 
two relatable series of values. In a 
single column matrix, whatever com- 
ponent would otherwise have con- 
stituted a horizontal axis is now 
treated as singular. 

Nearly every psychological re- 
search design is concerned with con- 
comitance, but not necessarily with 
covariation (i.e., concomitant varia- 
tion). Since the covariation design is 
really a special case of concomitance 
design, it would be worthwhile to 
have a scheme of classification for 
concomitance designs which would 
parallel that for covariation designs. 
Such a scheme is presented in Table 
3. Since in each concomitance design 
the serial variable is replaced by an 
additional singular variable, each 


,concomitance design may be con- 


Sidered a truncated version of either 


; of two covariation designs. 


TABLE 3 
Basic CONCOMITANCE DESIGNS 


Varia- Singu- 
Varia- blesin laror Parallel 
Tech- iiss: which con- covari- 
nique “ho ted varia- stant ation 
tionis varia- designs 
noted bles 
Alpha S R P.O RIE 
Beta fii R 50 QS 
Gamma oO R SEC Goa, 
Delta R Ss POT are 
Epsilon “al S R,O B, E 
Zeta (0 S R,P U 
Eta R R 5,0 GAE 
Theta S P RO HK 
Iota 0] P RS ae 
Kappa R 0 SP U, W 
Lambda S (0) Ry Pi Aas 
Mu P. 0 R, Ste 


S ier TE S 

Note.—The letters in the second, third, and 
fourth columns refer to design components 
from which variables are drawn. 


APPLICATIONS OF CONCOMITANCE 
DESIGNS 


The techniques labeled Alpha, 
Beta, and Gamma in Table 3 repre- 
sent the most familiar forms of psy- 
chological research, and in them we 
find the most frequent application of 
such forms of statistical analysis as 
the critical ratio and analysis of vari- 
ance. Beta technique has a common 
application in the comparison of 
responses of groups which differ with 
respect to variables outside the range 
of observation within the experiment 
(e.g. two different occupational 
groups, psychotics and “normals, 
men and women, etc.). Comparison 
of matched groups subjected to differ- 
ent stimulus conditions would con- 
stitute a form of Alpha technique, 
since P-component variables are held 
constant. Interest is here focused on 
the relating of stimuli, as in the 
simpler form of Alpha technique in- 
volving such a comparison for a 
single individual or single group of 
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individuals. A compounding of tech- 
niques is possible in designs of more 
than one-way classification. Thus, 
we should have a compound of Alpha 
and Beta techniques if we classified 
both in terms of known group mem- 
bership and in terms of stimulus con- 
ditions. The reader will note that the 
score matrix in terms of which we 
conceptualize the design differs from 
the tabular arrangement usually em- 
ployed with analysis of variance in 
that the variables to be related are 
represented along a common axis. 
Thus the score matrix for a complex 
factorial design of the Alpha-tech- 
nique variety would consist of a long 
single column of R data. Each row 
would represent the data for a group 
simultaneously scaled with respect 
to several stimulus dimensions. 

In Techniques Delta, Epsilon, and 
Zeta, the stimulus is conceptually the 
dependent variable. These tech- 
niques bring to mind certain applica- 
tions of psychophysical methods. 
Strictly speaking, the procedures 
usually called “psychophysical meth- 
ods,” as described by such writers as 
Graham (1950) and Guilford (1954), 
are methods of measurement and do 
not define specific experimental de- 
signs to any greater extent than do 
methods of statistical analysis. In 
actual application, however, they 
form a basis for a limited range of 
concomitance designs. 

The most common applications of 
psychophysical methods may be 
thought of as constituting either 
Alpha technique or Delta technique, 
depending largely on the use made of 
the data. The simple application of 
the method of constant stimuli, for 
example, would constitute Delta 
technique if we dealt with the result- 

ing data in terms of a relationship be- 
tween the two response categories, 
Each of the two cells of the corre- 
sponding score matrix would contain 
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the value of the stimulus eliciting the 
given response for a certain percent- 
age of trials. On the other hand, find- 
ings may be expressed by means of a 
curve in which stimulus magnitude 
is plotted against the percentage of 
trials in which either response is pro- 
duced. The design may then be con- 
sidered either Alpha technique or 
Delta technique, depending on 
whether we consider the curve as a 
way of expressing relationships 
within a continuous series of S cate- 
gories or within a continuous series of 
R categories (percentages in the pres- 
ent instance). Similar reasoning 
would apply, of course, to the appli- 
cation of other psychophysical meth- 
ods. More complex applications of 
these methods, in which R variables 
are related to a combination of inter- 
acting S dimensions—as in Lick- 
lider’s (1951) treatment of auditory 
functions—may be regarded as com- 
parable to the application of factorial 
design in Alpha technique. Psycho- 
physical methods are less commonly 
applied in research classifiable as 
Epsilon or Zeta technique, although 
certain applications of these methods 
in clinical research (e.g., certain 
studies involving flicker fusion, size 
judgments, distance judgments, and 
judgments of the vertical) would 
certainly qualify as Epsilon tech- 
nique. 

Techniques Eta, Theta, and Iota 
are a common realm of application 
for nonparametric techniques of sta- 
tistical analysis. Depending on the 
manner in which P-component data 
are expressed, we may analyze find- 
ings in terms of cell frequencies, over- 
lap of cases among cells, or compara- 
bility of person ranks associated with 
various cells, 

Techniques Kappa, Lambda, and 
Mu are most likely to be useful when 
variations in an occasion variable are 
presumed to covary with certain 
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attributes of subjects or with certain 
changes in the life situations of sub- 
jects. The O-component variable 
may thus reflect such things as age, 
developmental stage, and level or 
stage of experience. The develop- 
mental area is probably the most 
common realm of application. Kappa 
technique provides a means for 
grouping behaviors developmentally 
and hence for defining developmental 
stages. Lambda technique provides a 
way of defining stages in terms of 
effective stimuli. Mu technique can 
be used to compare individuals with 
respect to such things as rate of 
maturation. Many applications out- 
side the developmental realm, to 
processes involving shorter time 
spans, are possible. 


APPLICATIONS OF COVARIATION 
DESIGNS 


A detailed discussion of possible 
applications of the familiar Tech- 
niques R, Q, P, O, S, and T would be 
superfluous here. Unfortunately, 
other treatments of these techniques 
have promoted misconceptions by 
obscuring three interrelated con- 
siderations that are basic to con- 
sistent classification. First, there is 
the distinction between concomitance 
and covariation designs. A second 
vital point is that the series over 
which covariation is observed must 
be genuinely treated as a series in 
covariation designs. | Wherever a 
group is treated as a unit and a group 
average is treated as a single observa- 
tion, the group functions, for design 
classification purposes, as a single in- 
dividual. Finally, in research em- 
ploying matched groups, the P com- 
ponent is properly viewed as being 
held constant, and appropriate classi- 
fication will depend on what compo- 
nent is confounded with Component 

- For example, in the common type 
of experiment in which equated con- 
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trol and experimental groups are sub- 
jected to different stimulus condi- 
tions, we have an instance of P tech- 
nique (not S technique, as some 
writers would have it), provided that 
response covariation is considered 
over time. The usual application of 
this design, to a single occasion, is 
simply Alpha technique. 

The remaining techniques—A 
through L and U through Z—repre- 
sent virtually unexploited forms of 
design, but careful consideration will 
suggest appropriate uses for each of 
them. In Techniques A through F, 
the dependent variable is of Compo- 
nent S. Appropriate quantification 
might be in terms of minimally 
sufficient stimulus magnitude or 
mean stimulus magnitude associated 
with a given response. In Techniques 
G through L, the dependent variable 
is of Component P. It may be ex- 
pressed in terms of the rank of the in- 
dividual giving a certain response to 
a certain stimulus on a certain occa- 
sion, in terms of the average rank of 
individuals so responding, Or in 
terms of the number of individuals so 
responding. In Techniques U, V, W, 
X, Y, and Z, our focal variable—of 
Component O—may be expressed in 
terms of a single occasion in an 
ordered series, an average of a num- 
ber of ordered occasions, an average 
age, an average stage, etc. It is im- 
portant to note that in covariation 
designs, in contrast to concomitance 
designs, the dependent variable must 
be of at least the ordinal level of scal- 
ing. Thus, in Techniques Eta, Theta, 
and Iota, the matrix cells could 
simply contain tags identifying the 
persons fitting the cell coordinates. 
Data analysis would then consist of 
assessing the overlap of entries in 
various cells. Ina matrix of several 
rows and columns containing such 
nominal data, we could probably 
speak of “multiple concomitance 
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with respect to a pair of rows or 
columns, but it is doubtful that we 
can properly speak of “covariation” 
unless the data in the body of the 
matrix are expressed in a form repre- 
senting relative magnitudes or posi- 
tions on continua. 

Techniques A, C, G, I, U, and W 
all deal with the covariation of re- 
sponse categories and thus provide a 
basis for defining the structure of a 
response realm. In C and U, we 
assess the comparability of response 
categories on an intra-individual 
basis, with respect to conjoint ap- 
pearance on various occasions or in 
response to various stimuli. Tech- 
niques A, G, I, and W provide a 
means for assessing response com- 
parability on a group basis, in terms 
of similarity of the precipitating 
stimulus, of the occasion of mani- 
festation, or of the persons giving the 
response. 

Techniques R and P are familiar 
techniques for examining stimulus 
covariation in terms of the resulting 
response. Techniques H, K, V, and 
Y add possibilities for correlating 
stimuli in terms of covariation with 
respect to the magnitudes (ranks) or 
numbers of persons responding in a 
certain way or in terms of the par- 
ticular occasions or numbers of occa- 
sions on which the stimulus has a 
given effect. The correlating of per- 
sons is also a familiar idea by virtue 
of its introduction through Q and S 
techniques. Techniques B, E, X, and 


Z point to the possibility of correlat- 
ing persons according to stimuli pro- 
ducing various responses, stimuli pro- 
ducing a given response on various 
occasions, occasions when various re- 
sponses appear, or occasions when 
various stimuli elicit a given response. 
Consideration of the many possible 
ways of defining the basic stimulus, 
response, and occasion data suggests 
a great variety of ways of grouping 
persons according to such things as 
physiological cycles, social roles, and 
developmental patterns. 

In applying any of the occasion- 
correlation techniques—O, 7, D, F, 
J, and L—we may select presumably 
equivalent occasions and thus ob- 
tain an estimate of the reliability, or 
stability, of a given pattern of rela- 
tionship. We may, on the other hand, 
select occasions differing in a known 
way and thereby determine the com- 
parability of these occasions. Pos- 
sible applications range from the 
psychophysical realm to the develop- 
mental realm, depending on how the 
occasion variable is defined and 
quantified. In general, the new co- 
variation techniques encompassed by 
this expanded classification system 
promise a rich harvest through novel 
approaches to diverse problems— 
particularly in the developmental, 
social, and physiological areas, where 
the possible fruits of correlational 
analysis have been recognized by too 
few researchers. 
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THE SELF-CONCEPT: 
FACT OR ARTIFACT? 


C. MARSHALL LOWE! 
Ohio State University 


One of the more difficult tasks for 
psychology is relating the observa- 
tion of behavior to the study of 
mental processes. One approach to 
the problem has been to limit psy- 
chology to the study of behavior and 
to leave to philosophy the task of 
speculating as to the existence and 
nature of mind and soul. 

There have, however, been psy- 
chologists who have sought to make 
sense out of human action by positing 
a self or ego, in order that they might 
understand the coherence and unity 
which they have thought that they 
have seen in human behavior. Thus, 
G. W. Allport (1943) claimed that 
the concept of ego was made neces- 
sary by certain shortcomings in as- 
sociationism, and he went on to list 
eight different uses for the concept of 
the ego. During the 1940s the 
Psychological Review was in fact well- 
flavored with articles of philosophical 
taste (Allport, 1943; Bertocci, 1945; 
Chein, 1944; Lundholm, 1940). These 
articles were attempts to find the 
source of human behavior by dis- 
cussions of concepts, but they failed 
to make a lasting distinction between 
the self as subjective knower and the 
self as object of knowledge. The self 
as essence defied definition, and the 
discussions concerning the nature of 
mind seemed relevant for neither 
experimental nor applied psychology. 

But during the 1940s there was a 
parallel attempt at construction of a 
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useful concept of the self. While 
Rogers wrestled with the problem of 
researching a client centered ap- 
proach in psychotherapy, one of his 
students (Raimy, 1943) developed a 
construct of the self which had a 
perceptual frame of reference, What 
Raimy called the self-concept was 
both a learned perceptual system 
functioning as an object in the per- 
ceptual field, and a complex organiz- 
ing principle which schematizes on- 
going experience. Raimy demon- 
strated in his dissertation that atti- 
tudes toward the self can be found 
by analyzing counseling protocols, 
and that these self-perceiving atti- 
tudes formed a reliable index for 
improvement in psychotherapy. 

The concept of the self soon 
formed the theoretical underpinning 
for a new approach to the study of 
behavior. Raimy’s construct of the 
self received further development in 
the book Individual Behavior (Snygg 
& Combs, 1949). The authors stated 
that behavior was best understood as 
growing out of the individual sub- 
ject’s frame of reference. Behavior 
was to be interpreted according to 
the phenomenal field of the subject 
rather than be seen in terms of the 
analytical categories of the observer. 

As the self-concept was born with 
client centered therapy, so congruent 
were the theory of the self and the 
practice of psychotherapy that a 
new self centered therapy became 
theoretical for the first time: Rogers 
(1951) described therapeutic change 
in a phenomenological frame of 
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By 1950 the phenomenological 
view of the self had become the 
center of a new movement in psy- 
chology, having already generated a 
block of research studies (Rogers et 
al., 1949). When Hilgard (1949) pos- 
tulated in his APA presidential ad- 
dress the need for a self to under- 
stand psychoanalytic defense mecha- 
nisms, and called for research on the 
self, psychology listened. To the 
desert came rain that washed all 
before it. 

The deluge of studies within the 
last decade has not been contained 
within any one theoretical channel, 
so that studies involving the self- 
concept have spread over into many 
areas of psychology. Ten years of 
research efforts have produced a 
mass of data, reflecting different 
theoretical assumptions and differing 
research methods. While the time 
has now passed for one article to 
deal adequately with all the studies 
that have been done, the sheer mass 
of evidence would suggest that cer- 
tain questions be asked of theories of 
the self-concept. 

This paper is concerned with the 
problem as to whether the self is an 
objective reality which is a fit field 
for psychological research, or whether 
it is a somewhat nebulous abstrac- 
tion useful only to give a theoretical 
basis to things the psychologist 
could not otherwise understand. Put 
in other words, this paper faces the 
issue as to whether the results of 
studies of the self are to be accepted 
at face value, or whether other ex- 
planations of results would be more 
parsimonious or reasonable, 

The writer will discuss first at- 
tempts to quantify data concerning 
the self-concept to arrive at an 
operational definition. We will then 
assess the validity of measures of the 
self-concept, and will relate the self- 
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concept to other constructs. We will 
briefly allude to attempts to estab- 
lish a relationship between different 
measures. Finally, the writer will 
return to certain philosophical and 
historical considerations in order to 
reach a conclusion as to whether the 
self-concept is indeed a fact of na- 
ture, or an artifact of men’s minds. 


MEASURING THE SELF-CONCEPT 


Many psychologists have believed 
that if something exists it can be 
measured. There have been many 
investigators who have assumed that 
the self-concept refers to an existence 
of some sort and have gone on to 
measure it. 

The most popular type of opera- 
tional definition has assumed that 
the self-concept can be defined in 
terms of the attitudes toward the 
self, as determined either by the 
subject's references to himself in 
psychotherapy or by asking him to 
mark off certain self-regarding atti- 
tudes on a rating scale. 

One of the first attempts at atti- 
tude measurement was by Sheerer 
(1949), who extracted from the pro- 
tocols of cases at the University of 
Chicago Counseling Center all state- 
ments that were relevant either for 
attitudes to self or to other people. 
These statements formed the basis 
for a 101-item rating scale. The 
Sheerer client statements also formed 
the basis for rating scales constructed 
by Phillips (1951) and by Berger 
(1952). 

The only rating scale of attitudes 
towards self that has been published 
is the Index of Adjustment and Val- 
ues (Bills, 1958). Bills states that the 
intent of the index is to measure the 
phenomenological self view as de- 
scribed by Lecky (1945), Snygg and 
Combs (1949), and Rogers (1951). 
This scale is more elaborate in that 
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each item is ranked with three differ- 
ent instructions. First, the subject 
ranks the item on a scale as to how 
well it describes himself. Next, he 
marks the items as to how acceptant 
he is of his first, or self-rating of the 
item, and finally he rates the item as 
to the degree to which he aspires to 
be like that item. 

The scoring ‘of the Bills index also 
is more elaborate than that tradi- 
tional for rating scales. There are in 
fact two different measures, neither 
one being simply a rating of items in 
absolute terms, as in the scales previ- 
ously described. Bills’ measures de- 
pend instead upon the differences 
between ratings made under different 
instructions. A measure of self-ac- 
ceptance is provided by the degree of 
similarity between the way the sub- 
ject sees himself as being, and the 
way he rates himself as accepting his 
self-ratings. A measure of self- 
ideal-self discrepancy is given by 
comparing the differences in ratings 
between the way the self is rated as 
being, and the way the self is rated 
as wishing to be. 

Brownfain (1952) made still an- 
other adaptation in the use of the 
rating scale, deriving a measure of 
what he termed the stability of the 
self-concept. Subjects ranked them- 
selves on 25 words and phrases, each 
describing a different area of per- 
sonality adjustment. The measure is 
not of how sure the subject is of him- 
self, but of how sure he is of what he 
thinks about himself; the subject is 
instructed to make the ratings twice, 
first with an optimistic frame of 
reference, and then with a pessimistic 
one. The degree of congruence be- 
tween the two ratings is termed the 
degree of stability of the self-concept. 

A different theoretical approach 
towards measurement of self-con- 
cept involves the use of Q technique. 
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Stephenson (1953) describes how 
one’s “inner experiences” can be 
translated into behavior by means of 
Q sort, through which the phenome- 
nal field is translated into action. 
Using this method, two of Stephen- 
son’s students at the University of 
Chicago derived a conceptual self- 
system in an intensive study of a 
single subject (Edelson & Jones, 
1954). 

Others at the University of Chi- 
cago have used Q sorts as a measure 
of self-concept, in an attempt to 
assess changes in self-concept during 
psychotherapy (Rogers & Dymond, 
1954). Statements were taken from 
counseling protocols, and were sorted 
both for real self and for ideal self. 
The degree of congruence between 
the two sorts is taken as a measure of 
adjustment. 

Attempts to measure the self-con- 
cept face three difficulties. First, it 
must be demonstrated that the 
operational and philosophic meanings 
are in fact equivalent. In the case of 
the self-concept it needs to be shown 
that the “inner experience” is effec- 
tively conveyed by the outward 
movement of making check marks on 
lines, or sorting cards. Secondly, an 
efficient and systematic method must 
be found for selecting items for the 
scales and sorts, the problem being 
that of defining the universe from 
which items are to be selected. Fi- 
nally, the different measures imply 
different operational definitions. Just 
as one can not multiply apples and 
pears, so is it impossible to inter- 
change different operational defini- 
tions as if they were the same, or to 
pretend that each means the same 
thing by the term self-concept. i 

If something is measured does it 
exist? If the answer is yes, we must 
still be aware that we may not fully 
understand what we are measuring. 


328 C. MARSHALL LOWE 


One must measure, but must then 
compare and carefully validate. 


VALIDATION OF SELF-CONCEPT 
MEASURES 

A psychological construct stands 
and falls according to how useful it is 
in understanding human behavior. 
A term is meaningful only when 
successful validation studies have 
found significant relationships with 
established variables. 

It has been popular to validate 
self-concept scales against tests pur- 
porting to measure maladjustment 
in an attempt to demonstrate that 
one’s phenomenological view of the 
self is closely related to the degree of 
adjustment. Positive results abound. 
Calvin and Holtzman (1953) had 
college students rank themselves on 
seven personality traits, and found 
that self-depreciation was related to 
high scores on the MMPI. Zucker- 
man and Manashkin (1957) had 
neuropsychiatric patients rate them- 
selves on a scale of adjectives, and 
found that self-ratings correlated 
positively with the MMPI K scale, 
and negatively with seven of the 
other scales. Taylor and Combs 
(1952) tested the hypothesis that 
sixth grade children found to be well- 
adjusted on the California personal- 
ity scale would more often admit 
statements of self-reference which 
though unflattering were universally 
true. They got positive results, the 
self-depreciation which in other self- 
concept measures is treated as vice 
being here treated as virtue. Hanlon, 
Hofstaetter, and O’Connor (1954) 
compared the results of high school 
juniors on the California personality 
scale with the degree of congruence 
between ratings of the real and ideal 
self and found that the more con- 
gruence the better the adjustment. 
Cowen (1954) related low self-ratings 


on the Brownfain negative self-con- 
cept with high scores on the Cali- 
fornia F Scale. Any doubt about the 
ability of investigators to find posi- 
tive results when comparing good 
adjustment as measured by objective 
personality inventories with the 
affirmativeness of self-concept should 
be dispelled by a study by Smith 
(1958). He compared congruence 
between Q sorts for self and ideal 
self with scores on the Edwards PPS, 
the Cattell factors, and measures of 
average mood. After making almost 
300 correlations, he concluded that 
having a positive self-concept is in- 
deed related to adjustment. 

Other investigators have doubted 
that the relationship between adjust- 
ment and self-satisfaction is such a 
simple one. Block and Thomas 
(1955) conceived of maladjustment 
lying at both ends of the continuum. 
They felt that too high a degree of 
self-satisfaction is due to suppressive 
and repressive mechanisms which 
cause a person to be rigid, over-con- 
trolled, restrained, and aloof. But at 
the other extreme, the person who is 
too little satisfied with self will lack 
ego defenses, and will be able neither 
to bind tensions nor control emotions. 
Block and Thomas constructed an 
ego-control scale from MMPI items. 
The scale was found to have a corre- 
lation of .44 with self-ideal-self Q 
sort congruence, the relationship 
being curvilinear. | Unfortunately, 
this was the reverse of what Chodor- 
koff (1954a) had found. Correlating 
ratings of the self as made from a bio- 
graphical inventory with the results 
of projective techniques, he found 
that maladjustment lies in the middle 
range of self-satisfaction. 

Validating self-concept measures 
against objective personality tests 
has generally been successful, but the 
true significance of these studies 18 
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till not made clear, Edwards (1957) 
lemonstrates how more than half 
he variance in both MMPI scales 
ind in Q sorts of self-referent items is 
accounted for by social desirability. 
SD can account for significant posi- 
ive relationships even when other 
variables are totally unrelated. Ed- 
wards’ SD robs these studies neither 
of significance nor of interest, but 
does suggest that extreme care must 
be taken in the labeling of constructs. 
Attempts have also been made to 
yalidate self-concept against projec- 
tive personality tests. Bills has made 
several attempts to validate his scale 
by the Rorschach (Bills, 1953a, 1954; 
Bills, Vance, & McLean, 1951). The 
results are a bit ambiguous, and 
leave two observers (Cowen & 
Tongas, 1959) extremely dissatisfied. 
“The TAT was used by Friedman 
(1955) to compare the Q sort dis- 
crepancy self with the self as pro- 
jected onto the TAT pictures. The 
normals were the only group to pro- 
ject positive self-qualities. Neurotics 
and paranoids both projected nega- 

tively. 
A different approach to validation 

has used a word association test. Re- 
‘sults show that there is a delayed 
reaction time for those trait words 
i where there has been a discrepancy 
in ratings between the self and the 
ideal self (Bills, 1953b, Roberts, 
1952). Delayed associations are 
assumed to be related to defensive- 
ness about self, which in turn is 
considered to be related to maladjust- 
ment. However, Cowen and Tongas 
(1959) wonder if defensiveness about 
trait words does not serve also to 
Ta 3 the original ratings of the actual 
Cowen chose to validate the self- 
ncept by comparing the absolute 
f-rating with the learning time for 
e rated words, and found that 
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there was a higher learning time for 
words that were presumably threat- 
ening. We might however wonder if 
his and Tongas’ criticism of other 
studies does not apply here also: 
defensiveness might also cause self- 
ratings to be raised. 

Use was also made of the percep- 
tual New Look. Chodorkoff (1954b) 
presented neutral and threatening 
words with a tachistoscope, and 
found that the better the agreement 
between a self-description and a 
description of the self by others, the 
less perceptual threat there will be. 

It is unfortunate that the only 
study of this general type that did 
not use college students as subjects 
was negative in its results. Zimmer 
(1954) presented male mental pa- 
tients with trait adjectives on which 
there was a self-rating discrepancy 
between self and ideal self. A word 
association test was not found to be 
significantly related to self-discrep- 
ancy. 

The results of studies that involve 
the presentation of “hot” or threat- 
ening words seem suggestive, for 
there seems to be a common element 
in ability to free associate and learn 
threatening words. But it is possible 
that we have in these studies more a 
measure of ego defenses than of mal- 
adjustment, the fact that the 
results are positive only with normal 
groups might suggest that the results 
are more relevant for a theory of 
personality than for a theory of 
psychopathology. In these studies it 
is indeed likely that we have support 
for Lecky’s theory of self-consist- 
ency, and for Snygg and Comb's 
theory of the maintenance of internal 
organization. If this is so, then likely 
it is true as Block and Thomas (1955) 
suggest that only extremes in ego 
control are pathological. 


A different approach to validation 
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of self-concept measures uses be- 
havior in a social situation as a 
criterion. The most sweeping results 
in a study of this type are reported 
by Turner and Vanderlippe (1958), 
who report that Q sort congruence 
between the self and the ideal self is 
greater in those college students who 
are more active in extracurricular 
activities, have higher scholastic 
averages, and are given higher socio- 
metric rankings by fellow students. 
Holt (1951) found that agreement 
between self-ratings and ratings by a 
diagnostic council was positively re- 
lated to intelligent, active, adven- 
turous living, and a friendly domi- 
nant social adjustment. Eastman 
(1958) found that the degree of ac- 
ceptance of self-ratings on the Bills 
index is positively related to ratings 
for marital happiness. Working in 
terms of ratings for maladjustment, 
Chase (1957) found that among mal- 
adjusted patients there was greater 
discrepancy between Q sorts for self 
as compared with sorts for the ideal 
self and the average other person. 
Other attempts to relate self-con- 
cept to social behavior have been less 
successful. Kelman and Parloff 
(1957) obtained only chance results 
when they tried to interrelate such 
variables as congruence between self 
and ideal self, a symptom disability 
check list, a discomfort evaluation 
scale, sociometric ratings, and an in- 
effective behavior evaluation scale, 
using 15 neurotic hospital outpa- 
tients. Fiedler, Dodge, Jones, and 
Hutchins (1958) measured the self- 
concept of college students both by a 
simple rating scale and by a discrep- 
ancy measure. There was a general 
lack of correlation between these 
measures and such objective criteria 
as grade point average, health center 
visits, army adjustment, the Taylor 
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MA scale, and sociometric 
Coopersmith (1959) compared 
esteem as rated by the self with 
estimated by observers, using c 
dren as subjects. He suggests that 
there are actually four types of self- 
esteem: what a person purports to 
have, what he really has, what he 
displays, and what others believe he 
has. 
There is no obvious explanation 
for the discrepancy of results in 
studies purporting to relate self- 
concept to behaviora) adjustment. 
Since the basis for selecting items for 
rating scales and Q sorts differs fro! 
study to study, it is possible that the 
statements used in the scales of those 
studies with positive results had m 
of a relationship to the criteria than 
the statements in studies which were 
negative. 
A different approach to relating 
self-concept measures to adjustment 
is shown in a block of psychotherapy 
research studies at the University of 
Chicago (Rogers & Dymond, 1954). 
Change in self-concept was found to 
occur as a function of improvement 
during psychotherapy. Butler and 
Haigh (1954) had clients make @ 
sorts for self and for ideal self both 
before therapy and after its comple- 
tion to test the hypothesis that ther- 
apy will increase satisfaction with the 
self. Congruence between the two 
sorts increased as a result of psycho- 
therapy, the two sorts moving to- 
wards a common mean. Rudikoff 
(1954), using the same subjects, 
found changes during periods of time 
before and after therapy were not 
nearly as great as those occurring 
during therapy. Also with the same 
subjects, Dymond (1954) found that 
there was closer agreement after 
therapy between the way clients 
sorted the Butler and Haigh Q sort 
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cards, and the way two non-Rogerian 
clinical psychologists sorted the cards 
between what the well-adjusted per- 
son should say is like him and what is 
not like him. 

The same investigator (Cart- 
wright, 1958) related change in self- 
concept over therapy to a successful 
search for identity. She had clients 
make sortings with Butler and Haigh 
Q sort cards to describe themselves as 
they saw themselves in relationship 
to three people of their choice to test 
the hypothesis that successful ther- 
apy increases the consistency of the 
self-concept which one brings to 
different social situations. The hy- 
pothesis was confirmed. 

Ewing (1954) had counselee col- 
lege students rate a list of traits for 
self, ideal self, mother, father, coun- 
selor, and a culturally approved 
figure. There was a regression of the 
ratings toward a common mean in 
those clients who were estimated to 
be the most improved in therapy. 

Changes in self-ratings over ther- 
apy seem certainly to have occurred. 
But they seem to take place also with- 
out psychotherapy. Taylor (1955) 
devised a Q sort divided between posi- 
tive and negative statements. After 
subjects made repeated sortings both 
for self and for ideal self, he concluded 
that self-introspection without ther- 
apy results in increased positiveness 
of attitude toward the self; that the 
self and ideal self will draw closer to- 
gether; and that repeated self-de- 
scriptions are accompanied by in- 
creased self-consistency. Engel (1959) 
studied the stability of self-concept 
in adolescence, and also found a 
trend towards more positive Q sort- 
ing over a 2-year period. And finally 
Dymond herself (1955) found an 
increased congruence between Q sorts 
for self and for ideal self among sub- 
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jects waiting for eycheteene ae 
though ratings of adjustment 

on TAT protocols showed no change 
over the period. 

Dymond attributes increased self 
~ideal-self congruence without psy- 
chotherapy as due to the strengthen- 
ing of neurotic defenses. It might be 
charged that similar changes during 
therapy might have the same basis. 
Dymond also raises the possibility 
that the sorts can be influenced by 
the attitude of the therapist towards 
the client's self. There is in short no 
complete assurance that the cognitive 
self-acceptance as measured by the Q 
sort is related to the deeper level of 
self-integration that client-centered 
therapy seeks to achieve. 

Indirect evidence of change of the 
self-concept during counseling is pro- 
vided by studies showing changes of 
self-estimates. Several studies show 
that agreement between self-ratings 
on interests and the ratings of the self 
by interest inventories increase as a 
result of counseling (Berdie, 1954; 
Froehlich, 1954; Johnson, 1953; 
Singer & Steffire, 1954). The first two 
of these studies show a moderate in- 
crease in accuracy in predicting one’s 
intelligence, but very little improve- 
ment in rating the self on measures 
of personality. One might reason that 
some parts of the self-concept are 
peripheral to the core of the self (e.g., 
interests) and are therefore unstable, 
while other parts (e.g., personality 
estimates) are central to the self and 
are therefore extremely resistant to 
change. 
SELF-CONCEPT—SELF-CONSISTENCY 


If the self-concept is to have useful- 
ness as a construct it must be shown 
that it is consistent in a given self. It 
must be known whether the self-con- 
cept isa gestalt that is more than the 
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sum of different self-regarding atti- 
tudes, or whether instead the self- 
concept is an impossible attempt to 
generalize different feelings toward 
unique situations. 

One answer to this question is pro- 
vided by Akeret (1959). He inter- 
correlated self-ratings on academic 
values, interpersonal relations, sex- 
ual adjustment, and emotional ad- 
justment, achieving differentially 
positive interrelationships. Emo- 
tional adjustment was the best indi- 
cator, correlating + .61 with a total 
corrected for part-whole inflation. 
While Akeret interpreted his results 
as suggesting that an individual does 
not accept or reject himself totally, 
the results might also be interpreted 
as suggesting that some areas of self- 
regard are more central to the self- 
concept than other areas. 

Consistency in the self-concept was 
found by Martire and Hornberger 

(1957), who found very great simi- 
larities between measures of the 
actual self, the ideal self, and a 
socially desirable self. But incon- 
sistency was found by McKenna, 
Hofstaetter, and O’Connor (1956), 
who found that one’s self ideal dif- 
fered less from one’s close friends 
than the close friends differed from 
each other. These investigators 
concluded by rather involved reason- 
ing that the ideal self is sufficiently 
differentiated to seek different need 
satisfactions in different people. 

The search for consistency in the 
self led also to comparing scores on 
different measures of self-concept. 
Omwake (1954) compared three scales 
—the Bills, Phillips, and Berger— 
which measure acceptance both of 
self and of others. The scales were in 
closer agreement as to the degree of 
acceptance of self than they were as 
to acceptance of others. Brownfain 
(1952) found that low ratings of self 

were related on his scale to the dis- 
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crepancy between optimistic and 
pessimistic self-ratings, or what 
Brownfain termed stability of self- 
concept; and Cowen (1954) found a 
relationship between the pessimistic 
Brownfain self-ratings, and the dis- 
crepancy between self— and ideal- 
self-ratings on the Bills index. Ben- 
dig and Hoffman (1957) found that 
Bills’ scores on acceptance of self- 
ratings and on congruence between 
ratings of self and ideal self related 
equally well to scales of the Mauds- 
ley Personality Inventory. They 
therefore concluded that the two 
different Bills index measures are 
redundant. 

But on the negative side, Cowen 
(1956) found no relation between the 
so-called stability of self-concept on 
the Brownfain, and the different 
measures on the Bills. Hampton 
(1955) likewise failed to find any 
significant relationship between abil- 
ity to make realistic appraisals about 
oneself and the ability to admit state- 
ments that were damaging but prob- 
ably true. 

Different measures of the self- 
concept have different theoretical 
and operational bases. Where meas- 
ures apply similar rationale, signifi- 
cant correlations between measures 
have been found. But in similar 
measures such extraneous variables 
as response set and social desirability 
will produce similar bias. Measures 
of self-concept have reliability, and in 
a certain degree are interchangeable. 
Whether or not the reasons for simi- 
larity are intrinsic to the scales, the 
notion of the internal frame of refer- 
ence seems well validated. 


Discussion 


The scientist can not hold truths 
to be self-evident. What is known 0 
the self through direct report must be 
considered suspect due to philosophi- 
cal considerations, since the nature 
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of the “I” has been seen differently in 
each ideological epoch. Notions con- 
cerning the self are like other human 
ideas, and are inventions and not 
discoveries. The task is not that of 
discovering the “true self,” but in- 
stead of constructing those notions 
which increase understanding of hu- 
man behavior. Just as the number of 
inventions is potentially unlimited, 
so there need be no limit on the num- 
ber of constructions put upon the 
self. In this discussion we will pro- 
ceed functionally, and consider the 
uses to which different selves have 
been put. 

The first self is the knowing self of 
structural psychology. Its function is 
to apprehend reality. The rational 
nature of man has always been in 
dispute, and the New Look in per- 
ception has further undermined this 
conception. This article has cited 
studies which throw doubt on the 
ability of the self to perceive itself 
correctly in those areas which are of 
great value to it. It is the change in 
the self as perceiver of itself that is 
the aim of client centered therapy. 
Studies of client centered therapy do 
not reveal whether therapy brings 
the client any closer to reality, but 
they do provide some evidence that 
the perception of the self is brought 
closer to social expectancies. 

7 The second construction of the self 
is that of motivator. This is the self 
of thinkers who believe that the 
individual is motivated by a need 
for self-assertion, or self-realization, 
by realizing those potentialities which 
inhere within the self. Attempts to 
validate this construct of the self 
have been carried on through work 
Oon need achievement. This construct of 
the self seems involved also in ratings 
and Ọ sorts for an ideal self which 
Out-distances the real self. Here, of 
Course, the self whose reach exceeds 
its grasp is considered to be patho- 
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logical, for it is shown how psycho- 
therapy helps reduce the disparity 
between the real and ideal. 

The third construct of self is the 
humanistic, semireligious conception 
of the self as that which experiences 
itself. It is the “unique personal 
experience” of Moustakas (1957) and 
the experience of feeling in Rogers 
(1951). The difficulty for the psy- 
chologist is that such a conception is 
more religious than scientific; it be- 
comes a value-orientation, and, as 
the writer has shown elsewhere 
(Lowe, 1959), it becomes a highly 
controversial statement of what is 
the highest good. 

The fourth approach views the 
self as organizer. This self is the 
psychoanalytic ego; the internal 
frame of reference of Snygg and 
Combs (1949); and the source of 
construct making in G. A. Kelly 
(1955). Any operational measure of 
self-consistency would seem to imply 
the existence of such a self. It is this 
self that this article has been most 
directly concerned with; to the ex- 
tent that studies have been positive, 
the self does respond the same way in 
different situations. Conversely, to 
the extent that the studies have had 
negative results there is enough in- 
consistency in the self that it does not 
always act according to prediction. 

A fifth approach constructs the 
self as a pacifier. Such a self seems 
implied in Lewin (1936), who con- 
structed his system of personality in 
terms of valences or tensions which 
the organism seeks to keep to a mini- 
mum. It seems present also in 
Angyal (1941) who views life as an 
oscillation about a position of equilib- 
rium. The self in other words is seen 
as an adjustment mechanism which 
seeks to maintain congruence be- 
tween the self and the nonself. It is 
the verification of this type of self 
that seems implied by Q sort studies 
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that show increased congruence of 
real and ideal self as a result of psy- 
chotherapy. We must however note 
that the self as pacifier stands in 
direct opposition to the self as moti- 
vater. 

In the sixth view of the self, the self 
is the subjective voice of the culture, 
being purely a social agent. It is the 
self of both sociology and S-R 
psychology, for it sees behavioral 
responses solely in terms of social 
conditions or stimuli inputs. The 
self as an entity is denied, and be- 
havioral consistency is seen as resid- 
ing not in the individual but in simi- 
lar environmental events. If the term 
self is used, it is seen in terms of ego- 
involvements with loyalties which 
are determinative of the self. 

From these different conceptions 
of the self, we can choose the one 
which best fits our theoretical frame 
of reference. But which conception 
is chosen seems to depend more upon 
faith than upon logic, and the choice 
of one conception must of necessity 
deny other constructs. It seems im- 
possible that the self can function as 
a motivator which constantly tries to 
change the status quo, and asa paci- 
fier which minimizes the disparity 
between the real and ideal self. There 
is a contradiction also between the 
self as motivator and the self as feel- 
ing, for in the latter the self is ac- 
cepted as it is, but in the former is not. 
Differences are apparent also between 
the self as feeling and as pacifier. 
And finally, the self as agent of soci- 
ety is opposed to all other concep- 
tions. 
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CONCLUSION 


Is the self-concept a fact which, 
having an objective existence in na- 
ture, is observed and measured; or is 
it an epiphenomenon of deeper real- 
ity, invented by man that he might 
better study his behavior? 

The world has sought to be so sure 
of the self because there is so little 
else of which it can be certain. The 
self has become the anchor that man 
hopes will hold in the ebbtide of 
social change. But just as a fish 
could never know it was surrounded 
by water unless that water were to 
disappear, it is unlikely that Lecky 
(1945) would have known about self- 
consistency had he not lived in a 
culture which felt inconsistency. In 
Buberian terminology, the self is an 
It, which man invents because he 
can not find a Thou. 

The position of this paper must be 
that the self is an artifact which is 
invented to explain experience. If 
the self-concept is a tool, it must be 
well designed and constructed. We 
will conclude therefore with that 
construct of the self which best serves 
the 1960s. Such a construction com- 
bines the self of ego-involvement with 
the self of feeling. It is a self which is 
existential not to experience itself, 
but to mediate encounter between 
the organism and what is beyond. 
Such a self is what Pfuetze (1954) 
calls the “‘self-other dialogic theory 
of the self,” being interpreted nat- 
uralistically through Mead and 
transcendentally through Buber. It 
is as an artifact that the self-concept 
finds meaning. 
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It is the purpose of this papër to 
scrutinize the attempts which have 
been made to provide quantitative 
data relating to the inheritance of 
behavioral characteristics in infra- 
human animals, and to reanalyze 
these data in terms of the polygenic 
or multifactorial hypothesis of genet- 
ical determination. Much of the 
data derived from the psychological 
field shows continuous variation and 
is consequently of the sort which 
lends itself to such polygenic analy- 
ysis, as opposed to that employed in 
the analysis of discrete character- 
istics typical of classical Mendelian 
genetics. While it should be noted 
that there are now several experi- 
mental methods and analyses which 
have been developed for dealing with 
polygenic inheritance, it is not our 
present intention to undertake an 
evaluative survey of their relative 
merits as applied to psychogenetics 
at its current stage of development. 
Instead we propose to concentrate on 
and to employ one set of such tech- 


k 1 A summarized version of this paper was 
read to the Society for Experimental Biology 
in London in January 1961. 

E Harkness Fellow of the Commonwealth 
ag in the Division of Biology, California 
nstitute of Technology, Pasadena, California, 
1959-60, 


niques, those of biometrical genetics 
as developed by Mather (1949) ex- 
pressly for the analysis of continuous 
variation, especially in plants, and 
which we judge to be particularly 
promising in their application to the 
inheritance of behavior. An introduc- 
tion to the general model and assump- 
tions of this biometrical approach as 
applied to psychogenetics will be 
found in Broadhurst (1960). 


EXPERIMENTAL METHOD 


Few experiments in psychogenetics 
have been of a kind which can lead to 
a partitioning of the variation into its 
heritable and nonheritable compo- 
nents. Even fewer have been designed 
in such a way that the various tests 
are sensitive and the analysis reliable. 
Satisfactory experimental procedures 
for applying biometrical analysis in 
psychogenetics have recently been 
discussed by Broadhurst (1960) and 
we will merely note the following 
points: (a) the experimental animals 
should be randomized, and the ex- 
periments replicated ; (b) the parental 
stocks must be inbred; and (c) for in- 
vestigating a cross between two in- 
bred strains at least the two parental 
(P), first and second filial (Fi and F) 
and backcross (B) generations should 
be reared. 
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The published data which come 
nearest to satisfying these require- 
ments are those of Dawson (1932), 
Brody (1942), Goy and Jakway 
(1959), Jakway (1959), and Thomp- 
son and Fuller.’ Although no spe- 
cial precautions were taken by Daw- 
son and by Brody to ensure the homo- 
zygosity of the parental strains the 
genetical differences between them 
were much greater than those within 
them; and insofar as our interest is 
in the differences between the paren- 
tal strains, differences within them 
may be regarded as a further source of 
error variations, like the nonheritable 
differences with which they are con- 
founded. The procedures appropriate 
to the analysis of these five sets of 
data are outlined in the next sections. 


Analysis of Means 


Following Hayman and Mather’s 
(1955) and Jinks and Jones’ (1958) 
extension of Mather (1949) we can 
write the generation means of a cross 
between two inbred strains in terms 
of six parameters: m, [d], [h], [i], [j], 
and [I] which are the mean, additive, 
dominance, and three nonallelic, first- 
order interaction components be- 
tween pairs of genes, respectively. 

Thus: 


Pi=m+(d]+[i]—3[j]+3 0] 
Pa=m- [a]+ [i] +] 
Fi=m+[h]+3[]] 

F= m+4[h] 
Bi=m+4[d]+3[h]+3[i] 
Bs=m—3$[d]+3[h]+3[i] 


Hence from these generation means 
we can estimate the heritable com- 
ponents as: 


3 W. R. Thompson and J, L. Fuller, personal 
communication, 1959. 


P. L. BROADHURST AND J. L. JINKS 


{d]=B.—B 

{h] = F,—4F.—43P,—}P.+2B,42B, 
{i]=2B,+2B,—4F; 
{j]=2B,—P,—2B.+P: 
[)=P.+P.+2F:+4F2—4B,—4B; 


and their sampling errors (SE?) as; 


Via) = Vz, + V5, 
Vin) = V¥,+16V¥,+3VE,+ tV? 
+4V5,+4Vs, 


Vi = 4V5,+4Vs,+ 16VF, 

Vin = 4V5, HVF, +H4VE + Ve, 

Vin = Vr +VP,+4VF,+ 16VF, 
+16V3,+16V3, 


The standard errors of the compo- 
nents can thus be obtained and tests 
of their significance by the customary 
methods applied. 

If the gene effects are additive, 
that is, the genes are independent in 
their action, then the three com- 
ponents which estimate the effects 
of nonallelic interactions, [i], [j], and 
[l], will be nonsignificant and the fol- 
lowing identities known as scaling 
tests will hold within the limits of 
sampling error (Mather, 1949): 


Test A: P+ F,—2Bi =0 
Test B: P.+F,\—2B, =0 
Test C: Pi+P.+2F,—4F.=0 


A joint test of these three identities 
has been devised by Cavalli (1952). 
In this we estimate weighted least 
squares values for m, [d], and [h] 
from the generation means, assum- 
ing the absence of nonallelic interac- 
tions. The weights used are the re- 
ciprocals of the squared standard 
deviations of the generation means. 
The squared deviations of the ex- 
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pected and observed generation 
means are then a x? with (m—3) de- 
grees of freedom, where m is the 
number of observed generation 
means. 

If this x? is nonsignificant then 
nonallelic interactions are absent 
and we can interpret directly the 
estimates of [d] and [h] obtained in 
the scaling test. The ratio of 


[h]_ oh 
[d] raža 
is the so-called “potence ratio” 


(Wigan, 1944) and this is a measure 
of dominance only if the genes are 
associated in the parent lines, that is 
ra (the degree of association) = +1 
(Jinks & Jones, 1958) and all [h] in- 
crements have the same sign. The 
potence ratio can theoretically take 
any value between zero and infinity. 
While a significant potence ratio in- 
dicates dominance of the individual 
genes predominantly in the same di- 
rection, zero potence does not neces- 
sarily indicate absence of dominance. 

If the x? from the joint scaling test 
is significant then nonallelic interac- 
tions are present and these can be 
analyzed by estimating [i], [j], and 
[l] and testing their significance. A 
comparison of the signs of [I] and [h] 
will then tell us the type of nonallelic 
interaction involved. If their signs 
are the same then cooperative or 
complementary interaction between 
the genes predominates while if their 
signs differ competitive or duplicate 
interaction predominates (Jinks & 
Jones, 1958). The component [j] =rj2i 
on the other hand provides us with 
an indication of the distribution of 
the interacting genes in the parental 
lines. Thus with complete associa- 
tion r;= +1 and [j] may have a sig- 
nificant value, but with maximum 
dispersion r;=0 and [j] must be zero. 
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Scales 


Although allowance can be made 
for nonallelic interactions and geno- 
type-environmental interactions in 
the analysis of second degree statis- 
tics (Hayman & Mather, 1955; 
Mather & Jones, 1958), with the 
paucity of second degree statistics 
available in the psychogenetical ex- 
periments to be analyzed we can 
merely attempt to find an empirical 
scale on which these effects make no 
significant contribution to the varia- 
tion. 

Clearly, our criterion of an ade- 
quate scale which eliminates non- 
allelic interaction is one which leads 
to a nonsignificant x* in the joint 
scaling test. However, a scale which 
is empirically adequate for this pur- 
pose will not necessarily remove any 
genotype-environmental interaction 
which may be present, that is, lead to 
homogeneity of the variances of the 
parents and Fıs. We must, therefore, 
adopt a scale which at least minimizes 
and balances these two sources of 
bias. 


Analysis of Variances into Compo- 
nents of Variation 
On an adequate scale the variances 
(s) of the parent, Fi, Fs, and back- 
cross generations are (Mather, 1949): 
Vp, = Vr.= Vr =E, 
Vr, =}D+}H+E: 
Vs, +Vn,=}D+4}H+2E; 


where D==d?, H=2h? and E, are 
the additive, dominance, and non- 
heritable components of variation, 
respectively. 
In addition we have 
Vp,— Va,= + 2(dh) 


Solution of these equations leads to 
estimates of D, H, and Ei, from which 
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we can obtain estimates of dominance 
and heritability. The dominance 
ratio, H/D, will be zero for no dom- 
inance, one for complete dominance 
and greater than one for overdomi- 
nance (heterosis). Heritability can be 
assessed in a variety of ways of which 
we will use two, D/(D+E)), i.e., the 
ratio of the additive variation to the 
sum of the additive plus nonheritable 
variation, and ($D+4H)/(D+4H 
+E), which is the proportion of 
heritable variation in an F, popula- 
tion. These ratios therefore represent 
estimates of heritability ‘in the nar- 
row sense” and “‘in the broad sense,” 
respectively. 

When 2(dh) does not equal zero it 
supplies additional evidence for the 
presence of dominance. It also shows 
which parent carries the preponder- 
ance of dominant allelomorphs, for 
the backcross to this parent has the 
lower variance. 


Number of Effective Factors 


Only one estimate of the number of 
effective factors is applicable to the 
type of data so far obtained in 
psychogenetics, namely, the estimate 
of K, (Mather, 1949). This equals 
+(Pi—P,)?/D which for k genes of 
equal effect and associated in the 
parental lines equals k’d?/kd?. 

In practice this estimate is always 
minimal because it assumes that the 
genes are associated (i.e., ra=1) and 
that all genes give equal increments 
(i.e, da=dy=de-+-). It is, how- 
ever, worth obtaining in the psycho- 
genetical experiments because of the 
practice of deliberately selecting the 
most extreme lines available as 
parents in the cross. Such selection 

will lead to a preponderance of asso- 
ciation in the parental lines, thus 
partially satisfying one of the as- 
sumptions required. 
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EXAMPLES FROM THE LITERATURE 


We can now illustrate the analyses 
described earlier by reference to par- 
ticular experiments in the field of 
psychogenetics. 


Dawson 


Dawson's work (1932) was most 
accomplished genetically, and still 
might serve as a model of how a 
psychogenetical investigation could 
be approached, at least, from the 
genetical aspect. Unfortunately, the 
more purely psychological treatment 
is not of comparable quality and 
leaves much to be desired. Dawson 
investigated the inheritance of wild- 
ness in mice, defining wildness in 
terms of the speed the animals 
showed in running down a straight 
runway. We shall give his descrip- 
tion of the method. 


The method of testing consisted in placing 
the mouse at one end of a runway and allowing 
it to run to the other end. The time required 
was recorded by means of a stopwatch. The 
runway was 24 feet long, 9} inches wide and 
13 inches high. The sides and ends were of 
galvanized sheet iron, the floor of soft wood. 
One foot from each end a black line was 
painted on the floor of the runway. The time 
required for the mouse to run from one line 
to the other, a distance of 22 feet, was re- 
corded. A movable partition made of wall- 
board and bound with rubber was used to 
prevent the mouse from running back during 
the test and to aid in starting the test and 
capturing the mouse afterwards. The follow- 
ing procedure was carried out in testing the 
mice in this device. On the date that the mice 
in a certain pen were to be tested they were 
carried to the runway a short distance away 
and tested one at a time. The mouse to be 
tested was confined by the movable partition 
in a space about one foot from the end of 
the runway until everything was ready when 
the partition was raised and the stopwatch 
started as soon as the mouse crossed the black 
line. The mouse was followed by the exper 
menter with the partition which was placed in 
position to prevent the animal from running 
back if it showed any signs of doing so. Noth- 
ing was done to frighten the mouse other than 
the procedure described. This was usually 
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sufficient to cause even the tame mice to run 
or walk towards the other end of the runway. 
If the mouse ran swiftly, it was impossible to 
keep up with it with the partition; but if 
more slowly the partition was moved along 
and kept about twelve to fifteen inches be- 
hind the mouse. If the mouse stopped and 
showed no inclination to go forward the par- 
tition was slowly advanced until it touched 
the mouse. In all but three or four cases this 
was sufficient to start the mouse again. The 
few individuals where this was not the case 
were shoved a little and thus started. When 
the mouse crossed the line at the far end of 
the runway, the watch was stopped and the 
partition taken out of the runway allowing 
the mouse to run back to the starting point or 
in case it did not do so voluntarily it was urged 
by means of the partition. This prevented the 
mice from associating the far end of the run- 
way with being caught. Since each individual 
was tested three times this point was of con- 
siderable importance. After the mouse had 
been cornered at the starting point by means 
of the partition, it was caught and the number 
in its ear read.... These trials were con- 
ducted at weekly intervals after the mouse 
reached 75 days of age. In order to facilitate 
the testing and caring for the mice, a variation 
of one day in either direction was permitted. 
Thus the first trial for a given mouse might 
occur on the 74th, 75th or 76th day. A few 
trials had to be made on different dates. The 
trials were in nearly all cases made in the 
evening or at night when there was very little 
outside disturbance to distract the mice. The 
lighting was kept as far as possible the same 
throughout the experiment (pp. 299-300). 


_ It will be seen that a large subjec- 
tive element could enter into the fac- 
tors determining the speed of running 
of a given subject in the alley 
through the way in which the parti- 
tion was manipulated. This is not 
entirely overcome by the procedure 
of identifying the animal after com- 
pleting the test, as, in the parental 
generations at any rate, there were 
distinctive coat color differences 
between the strains. Nevertheless, 
the corrected reliability coefficient 
for the three trials for all the 1,232 
subjects used is reported as 0.92 
+ (SE)0.04. 

The subjects were a strain of wild 
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mice which had been reared in the 
laboratory for several years and three 
strains of tame mice, an albino, and 
two strains of brown mice with pink 
eyes, one of them also having short 
ears. The wild strain of mice ap- 
peared to be more highly inbred than 
the tame mice since selection among 
the former produced no response 
while the latter responded. This, 
however, as we have seen earlier, is 
not a serious problem in view of the 
large and high significant differences 
between the wild and tame strains 
for the measure under consideration. 

Dawson was able to extract a large 
amount of information from his re- 
sults regarding the nature of the 
genetical control of behavior in his 
runway situation. He showed that 
there was no linkage with sex or with 
any of the major gene effects identi- 
fable in his strains, and, by reciprocal 
crossing, that there were probably no 
directional maternal effects (Broad- 
hurst, 1961). He concluded that the 
wild-type behavior was dominant and 
that only a few genes are involved in 
determining the reaction. This last 
conclusion is principally based on the 
result of fitting curves, derived from 
Mendelian ratios and assuming vari- 
ous numbers of genes up to three, to 
the observed distributions. However, 
he admitted that probably a number 
of modifying genes were also in- 
volved. Implicit in his estimate of 
the number of genes were assump- 
tions concerning size of individual 
gene effects, the distribution of genes 
in the parental lines and their dom- 
inance relations. A biometrical anal- 
ysis along the lines proposed here 
would, therefore, appear to be ap- 

opriate. 
H Table 1 will be found the rel- 
evant generation means and their 
standard errors calculated from the 
data given by Dawson. The three 
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TABLE 1 
Dawson's Data: MEANS AND THEIR STANDARD Errors IN SECONDS AND (n) 


SE — aaa eee 


Generation Means 


P, (Wild) Ps (Tame) Fi F: B: B: 
Males 6.7£0.3 (43)|24.5 1.0 (63) |7.6£0.3 (76) |13.040.6 (175)]}6.6+0.3 (26)|27.4+3.9 (54) 20.84 1.6 (50) 
Females S.3£0.3 (47)|/25.3+ 1.2 (54) 16.940.3 (88) |11.840.5 (190)|6.2+0.5 (24) 18.7+1.5 (48) 
Both 5.90.2 (90)|24.9+ 0.8 (117)|7,2+0.2 (164)|12.4+0.4 (365)/6.440.4 (50)|23.34+2.2 (102) 19.741.4 (98) 


strains of tame mice were used in 
these crosses and the results pooled. 

Apart from the backcross to the 
slower parent (Bz), the sexes are in 
good agreement and this failure in 
the backcross can be traced to four 
males whose scores were greater than 
90. Individuals with such high scores 
are met with nowhere else in Daw- 
son’s experiments which included 
some second backcrosses, F3s and Fis 
in addition to the data given in the 
table. The results omitting the four 
males are also shown in Table 1. 
Their omission improves the already 
good agreement between sexes and 
the analyses can now be carried out 
on the pooled sexes. We may add 
that omitting these four individuals 
does not affect the interpretation of 
the data since the sex difference they 
suggest is borne out neither by Daw- 
son’s nor our own detailed analyses. 

The joint scaling test (Cavalli, 
1952) gives the following weighted 
least squares estimates from the 
pooled sexes, 


m= 15.99, [d]=10.10 and [h]=—8.74 


which when compared with the ob- 
served generation means givea Xa)? of 
9.4 (p=0.05—0.02). There is there- 
fore some nonallelic interaction pres- 
ent. However, its magnitude would 
not normally warrant rescaling but 
these data also show significant 
geno-type-environmental interaction 
(p<0.01) on the linear scale, as de- 
termined by Pearson and Hartley’s 


test (1958) for inhomogeneity of the 
Pı, Pe, and F; variances. Two trans- 
formations have, therefore, been 
tried, a square root and a log trans- 
formation. Of these the latter was 
the more satisfactory and the joint 
scaling test repeated on the new scale 
gave m=1.197+0.032, [d]=0.340 
+0.030, and [h]=—0.222+0.059 
which gave a satisfactory fit with the 
observed data (x;3)?=0.34). 

The same scalar change also re- 
moved the genotype-environmental 
interaction; hence a solution for D, 
H, and E, was attempted from the 
second degree statistics. These gave 
values! of 


D= 0.052+0.024 

H= — 0.008 + 0.032 

Eı= 0.020+0.005 
Z(dh)= — 0.032 


This gives a heritability estimated 
as D/(D+E) of 72% and estimated 
as (4D+4H)/GD+4}H +E) of 55%. 
The confidence limits (p = .05) for the 
first estimate of heritability are 61% 
and 83%. A further estimate of 
heritability can be extracted from 
Dawson’s data. He assortatively 
mated his Fz individuals to raise an 
F; generation and from the results 


* This is the only place in this paper where 
the data permit the estimate of standard 
errors for these components. No further suit- 
able replication of observations, e.g., by the 
provision of the raw data for both sexes, as 
in this case, is encountered. 
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we can estimate the parent/offspring 
correlation. This turns out to be 
0.51. 

Our estimate of the minimal num- 
ber of effective factors is 2.2 with 
confidence limits of 3.5 and 1.6. This 
is in good agreement with Dawson's 
estimate which as we have seen is also 
minimal. 

Thus, the behavioral difference be- 
tween the wild and tame mice in- 
vestigated by Dawson is controlled 
by at least three effective factors 
whose contributions are additive and 
independent of the environment on a 
logarithmic scale but which interact 
with one another and with the en- 
vironment on a linear scale. Es- 
timates of [i], [j], and [l] on the linear 
scale show that [j] which equals 
7.6+2.9 is responsible for the non- 
allelic interactions. The genes have 
a significant additive and dominance 
effect although the latter is not ap- 
parent in the second degree statistics 
presumably due to the effect of 
sampling variation on the negative 
correlation between the estimates of 
D, H, and E;. The potence ratio is 
negative and greater than zero which 
means that there is a preponderance. 
of dominant genes in the low scoring, 
i.e., wild type, parent. The signif- 
icant estimate of 2(dh) confirms this 
and also shows the presence of a 
dominance component of variation. 
Heritability is quite high and esti- 
mates from different sources give 
consistent results.’ 


Brody 
The analysis of Brody's experi- 


5 Since this paper was submitted for publi- 
cation, estimates of the number of genes and 
heritability in the Fe, derived from another 
reanalysis of Dawson’s data, have been pub- 
lished by Fuller and Thompson (1960). They 
used methods of analysis (Wright, 1952) sim- 
ilar to those proposed here, and their estimates 
are in substantial agreement with our own. 
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ments (1942) follows much the same 
pattern. She investigated the in- 
heritance of voluntary cage activity 
in rats using the high and low selec- 
tions for activity begun by Rund- 
quist (1933). The number of revolu- 
tions of the activity cages, during the 
last 15 of a 21-day period was taken 
as the measure. Each rat was housed 
in these cages at some time between 
60 and 100 days of age. Some meas- 
ure of inbreeding was practiced from 
the fifth generation of selection al- 
though neither its degree nor the pre- 
cautions taken to control environ- 
mental variation are stated. Our 
present concern is with the crosses 
made using the inactive and active 
strains at the twenty-first generation 
of selection. A complete program of 
breeding Fı, Fz, and backcrosses, Bı 
and Be, was carried out, and, more- 
over, repeated using these two strains 
at the twenty-second generation of 
selection as parents. In each case the 
strains were crossed reciprocally to 
give the Fis. 

Brody's results show reasonably 
good agreement between the values 
obtained in the two replications of her 
crossing program. An analysis of 
variance of the complete data gave 
significance for only two items: the 
difference between sexes, and the dif- 
ference between generations, Len Pi 
Pa, Fi, Fs, Bı and Ba. There were no 
significant differences between, the 
replicate crossing programs initiated 
at the twenty-first and the twenty- 
second generations of selection and 
no significant interactions between 
the three main effects. We can, there- 
fore, pool the two sets of crosses and 
sexes for the biometrical analyses. 

The joint scaling test on the pooled 
data given in Table 2 gave weighted 
least squares estimates of m=74.71 
+£5.55, [d] =63.03 +5.56 and [h] 
Z —3.41 +9.32. These did not pro- 
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TABLE 2 


Bropy’s DATA: MEANS AND THEIR STANDARD ERRORS IN REVOLUTIONS X10 4 
AND (7) { 


P: (Active) 


P; (Inactive) 
13.2£4.1 (136) | 136.4412.5 (67) 


73.6Ł7.1 (193) 


vide a satisfactory fit with the ob- 
served generation means, x° = 15.27 
(p=0.01—0.001). Clearly genic in- 
teraction is present on this scale. 
Since these data are not published in 
a form amenable to rescaling we can- 
not attempt to remove the interac- 
tion; we can, however, investigate its 
nature. 

Estimates of the components of the 
generation means and their standard 
errors showed that only m, [d], and 
[j] were significant hence weighted 
least squares estimates of these three 
components were made assuming the 
other components were zero. These 
estimates of m=72.14+3.61, [d] 
=81.54+ 13.62 and [j] =44.34+ 28.90 
provided a satisfactory fit with the 
observed generation means, x? = 5.28 
(p=0.20-0.10). 

A further argument in favor of re- 
scaling is provided by the second de- 
gree statistics which show significant 
(p=0.01—0.001) genotype-environ- 
mental interaction. Since, however, 
rescaling is impossible we must be 
cautious in interpreting the com- 
ponents of variation because of the 
possible bias from the [j]-type non- 
allelic interaction and the genotype- 
environmental interactions. H 
proved to be nonsignificant and nega- 
tive therefore D and E were recalcu- 
lated assuming H=0 with the fol- 

lowing result: D=3410.72 and 
E=1945.91.° These values give 64% 


è Despite Brody’s replication, no estimate 
of standard errors can be given for these com- 
ponents which is not potentially subject to 
serious inflation due to the various differences 


71.549.3 (260) | 26.849.3 (79) | 115.2+10.2 (136) 


and 44% as our two estimates of 
heritability and 1.95 as the minimal 
number of effective factors. The un- 
certainty of the latter estimate makes 
it impossible decisively to rule out 
Brody’s own interpretation based on 
a single gene difference between her 
selected parents. On the other hand 
we can discount a single gene inter- 
pretation on the basis of the signif- 
icant genic interaction. ; 

Thus the difference between thi 
spontaneous cage activity of the se- 
lected strains measured on a linear 
scale depends on at least two interact- 
ing genes which also interact with the 
environment. Asin Dawson’s experi- | 
ment it is the [j]-type interaction 
which is responsible for the genic in- 
teraction. There is no evidence of 
dominance and the potence ratio is 
zero. It is possible, however, that 
dominant and recessive alleles are 
equally frequent in the two selecte d 
strains. Heritability is about the 
same as in Dawson’s experiment but 
since the scale on which it is measured 
is unsatisfactory we cannot place too 
much reliance on its absolute mag- 
nitude. 

An unusual feature of Brody’s re- 
sults requires further comment. Both 
her Fıs were made reciprocally. Both 
of them show the same paternal ef- 
fect, that is, in the direction of a nega- 
tive influence of the mother—moth- 
ers from the active strain tending 
CHRISSIE 


between the variances noted, The impossibil 
ity of rescaling therefore renders the replicates 
unsatisfactory as a source of an estimate Ot 
error variation. 
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to have offspring lower in activity 
than those from inactive mothers. 
Pearson and Hartley's (1958) exact 
test for homogeneity of variance 
showed only the data from the 
twenty-first generation crosses were 
suitable for analysis of variance, 
which was applied following Snede- 
cor’'s method (1956) for dealing with 
unequal numbers in subgroups in a 
two-way classification. This analysis 
revealed an interaction between sex 
and strain as shown in Table 3, which 
may be summarized by saying that 
the significant tendency of the F, to 
be unlike the mother in respect of 
activity was more pronounced in the 
case of sons than of daughters. Sex 
linkage or paternal inheritance may 
be responsible for this complex situa- 
tion, but to distinguish between the 
two would require a much more in- 
volved experimental design than that 
used by Brody. 


Thompson and Fuller 


The systematic program of re- 
search in psychogenetics which has 
been proceeding at the Roscoe B. 
Jackson Memorial Laboratory at Bar 
Harbor, Maine, for the last decade, 
has, as might be expected, produced 
work of high quality. Only one set of 
data, however, has so far become 
available? which lends itself to the 
complete biometrical analysis pro- 
posed in this paper. This is the work 
of Thompson and Fuller (Fuller & 
Thompson, 1960, pp. 267-269; 
aa 1953, 1956, see Footnote 


Thompson and Fuller employed the 
two inbred strains of mice showing 
extremes of high and low activity 
rom a previous study in which a total 
of 15 strains had been studied, and 


7 We are indebted to W. R. Thompson for 
making a draft copy of the MS containing 
these data available to us prior to publication. 


TABLE 3 


Brooy's Data; Means or Recireocat 
Crosses in Revotvrions X10 


AND (s) 
-2 oe 
F; Offspring 
Strain of | ee ‘ 
Mother | Maks | Females 
——— — — —— 
Active | 33.9( 7) | 109.4 (24) 
Inactive | 87.9 (21) | 120.7 (30) 


tested a large number of subjects on 
each of two tests which they describe 
as follows: 
[Test 1] consisted of an open-field 30 by 30 
inches with walls 3} inches high, and a hinged 
wire-mesh top. The floor was painted gray 
and the walls a flat black. The floor was di- 
vided by lines into a grid of 36 quares, each 
5 by 5 inches. At the base of every other 
square was placed a barrier, 5 by 3} by 34 
inches, painted a flat black. Leading into the 
open-field at one corner was a starting box 
with a separate hinged top. Test 2 was a Y- 
maze with arms 11} inches long by 3 inches 
wide by 3} inches deep. Angles between arms 
were equal. One arm was painted black in- 
side, another gray and the third white, The 
maze was covered by a removable wire-mesh 
top. An animal was started at the end of the 
gray arm farthest from the junction point. 
Observation of animals in both tests were 
made under dim illumination as follows: in 
Test 1, a record was made by a mechanical 
counter of the number of lines crossed by a 
mouse in a 10 minute period. In Test 2, a 
count was made of the number of half-arm 
units traversed during each of six 100 second 
iods. . .. The correlation between the two 
tests was approximately 0.60 (Thompson & 
Fuller, see Footnote 3). 


The two parental strains are known 
to be highly inbred, but the measures 
taken to control environmental varia- 
tion are not specified. The Fi, Fs, 
and backcrosses were bred from the 
two strains and given both tests. The 
possibility of order effects of one test 
upon the other in the resulting data 
is not discussed. Reciprocal crosses 
were made, and the results pooled, 


34.6 


is were those of the two sexes as 
shown in Table 4. Thompson and 
Fuller subjected the data from the 
irst test to a square root transforma- 
tion in order to equalize the variance 
of the parental and F, generations 
which it did successfully. The trans- 
formed data is included in Table 4. 

The joint scaling test for Test 1, 
Test 2, and the transformed Test 1 
data gave the following weighted 
least squares estimates: 


Lomponent Test 1 
m 267.6+10.9 
[d] 257.8 +10.5 
[h] 29.2+20.5 
xa? 24.3 (p <0.001) 


On the linear scales both Test 1 
and Test 2 show unsatisfactory fits 
with additive gene action. Hence we 
have nonallelic interactions present. 
Analysis of these nonallelic interac- 
tions shows that the [j]-type interac- 
tion is again largely responsible for 
the failure of the linear scale, having 
values of 25.6+8.3 in Test 1 and 
55.8+9.0 in Test 2. Unfortunately 
the data given by Thompson and 
Fuller do not suffice to allow rescal- 
ing. Analysis of their square root 
transformed data for Test 1 has, 
however, shown that rescaling can 
educe the genic interaction. Since 
‘his scale also removes the significant 
yenotype-environmental interaction 
n Test 1 we can confidently analyze 
he second degree statistics on the 
quare root scale. 

Only Test 1 on the linear scale, 
owever, provides estimates of the 
omponents of variation D, H, and 
1 which are sensible, that is to say, 
ive positive values for D: the other 
vo tests do not. This result is not 
nexpected for Test 2 where inter- 
‘tions are present but it is unex- 
cted for the square root trans- 
rmed data of Test 1 where the in- 
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teractions are largely scaled out. We 
will, therefore, merely give estimates 
of the heritabilities for the F: popula- 
tion since this does not require the 
partitioning of the heritable com- 
ponents of variation (see above). For 
Test 1 the values are 73% and 53% 
for the linear and square root scales, 
respectively, and for Test 2, 26%. 
The low value in the latter test pro- 
vides an additional reason for the 
failure to obtain sensible es! i nates of 


Test 1 (transformed) Test 2 


12.50+0.41 166.1+4.1 
9.98+0.39 82.6+4.0 
5.13+0.76 2.07.0 


6.6 (p=0,10-0.05) 16.9 (¢ <0.001) 


the additive and dominance compo- 
nents of variation. With no estimate 
of D we cannot evaluate our estimate 
of the number of effective factors. 

Thus the genes controlling the two 
behavior patterns in mice investi- 
gated by Thompson and Fuller have 
a large additive effect but show no 
dominance on the linear scale. They 
do, however, interact with one an- 
other and with the environment in a 
way which can be largely removed by 
a square root transformation. Once 
again it is the [j]-type of interaction 
which is mainly responsible for the 
failure of the linear scale. On the 
square root scale there is a prepon- 
derance of dominance for higher ac- 
tivity in Test 1. The above conclu- 
sions are based on the analysis of 
means: in this case the failure of the 
analysis of the components of varia- 
tion merely serves to confirm the 
presence of interaction. 


Jakway and Goy 


A fourth set of data susceptible to 
a complete biometrical analysis has 
recently become available. It relates 
to the analysis of sexual behavior in 
the male and female guinea pig 
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(Goy & Jakway, 1959; Jakway, 
1959). Two highly inbred strains, 
whose history is documented as far 
back as 1906, and whose near homo- 
zygosity was established as early as 
1927 by the method of exchang- 
ing tissue grafts (Loeb & Wright, 
1927), were crossed, and Fs, Fes, and 
both backcrosses were bred, recip- 
rocally in each case. 

In the study of the inheritance of 
sexual behavior in female guinea pigs, 
the response to the injection of a con- 
trolled amount of female hormones 
in previously ovariectomized subjects 
was assesssed in terms of four be- 
havioral measures. No details are 
given of precautions taken to mini- 
mize environmental effects. The test 
technique is described as follows: 


The median age at the time of ovariectomy 
was 3.5 months in each genetic group. The 
distributions were not skewed. ... Tests of 
reproductive performance began one month 
later on the average. For the first 3 tests, each 
animal was injected with 100 I.U. of oestra- 
diol benzoate followed 36 hours later with 0.2 
I.U. of progesterone... . The volume of all 
injections was constant (0.5 c.c.), and injec- 
tions were given subcutaneously in the left 
axilla, Immediately after injection with 
progesterone, the animals were placed in a 
standard observation cage (in groups of 6 to 
12 individuals) and observed continuously for 
14 hours. Each animal was tested once every 
hour to determine the time of appearance of 
the lordosis reflex... . The first lordosis ob- 
tained was regarded as the onset of oestrus, 
and animals failing to respond on any of the 14 
hourly tests were viewed as not in oestrus. 
For those animals responding on at least one 
hourly test, oestrus was regarded as termi- 
nated when they failed to lordose on two suc- 
cessive hourly tests (Goy & Jakway, 1959, pp. 
142-143), 


The measures used are detailed in 
another paper (Goy & Young, 1957) 
as follows: 

(1) Latency of heat is the length of the inter- 
val between the injection of progesterone and 
the elicitation of the first lordosis. (2) Dura- 
tion of heat is the number of hours lordosis 
can be elicited. Toward the end of a heat 
Period, lordoses become feeble and difficult to 


TABLE 4 
THOMPSON AND FULLER’s DATA: MEANS AND THEIR STANDARD ERRORS IN UNITS AS INDICATED AND (n) 


18.9+0.7 


395.9 + 18.8 (63) 


11.0+0.7 


148.3 +14.3 (58) 
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206.24+2.7 


131.5+3.7 
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elicit, and an operational criterion is neces- 
sary to determine when an animal shall be 
classified as unresponsive. For this purpose, 
each animal is stroked or fingered five times 
and if no lordosis is displayed the animal is 
considered to be out of heat. . . . (3) The dur- 
ation of maximum lordosis in seconds... . 
The lordosis reflex includes several compo- 
nents, an arching or straightening of the back, 
elevation of the pudendum, displacement of 
the rear feet laterally and caudally so that a 
wide stance is taken, and emission of a low 
gutteral growl. When an estrous female is 
stroked lightly in a caudo-cephalad direction 
all components of the lordosis are displayed 
nearly simultaneously. If the stroking is con- 
tinued (prolonged stimulation), the full re- 
flex will be maintained for a time which varies 
with the genetic background and the phase of 
estrus. If stimulation is continued until volun- 
tary termination is produced, the duration of 
the reflex can be measured with a stop-watch. 
The response may be considered terminated 
when any one of the following signs is evident: 
(a) a sudden or gradual return of the back and 
pudendum to a normal position; (b) a sudden 
return of the feet to the normal position and a 
loss of the wide stance characteristic of the 
reflex; (c) kicking with the hind feet; (d) dash- 
ing forward; (e) squatting; (f) urinating; and 
(g) an abrupt termination of the growl ac- 
companied by a soft squeal. A stop-watch is 
started immediately on display of lordosis and 
stopped as soon as the complete response is no 
longer apparent....(4) Male-like mount- 
ing behavior. Mounts accomplished by an 
individual are classified as (a) complete 
mounts at the posterior end including pelvic 
thrusts, (b) posterior mounts without pelvic 
thrusts, and (c) abortive mounts which are 
not posteriorly oriented, do not involve 
clasping, and usually are not accompanied 
by pelvic thrusts. Recorded mounting 
activity is usually preceded by locomotor 
activity best described as prowling or stand- 
ing in one place and treading the floor of the 
cage with the hind feet. Both treading and 
prowling, when they precede mounting, are 
accompanied by the typical low gutteral 
growl or chatter. (5) Per cent of females 
brought into heat by the hormonal treatment 
(pp. 342-343). 


The means, together with SEs for the 
first four measures, are given in Goy 
and Jakway’s Table 1,8 and are not 
repeated here. 


8 The standard errors for the measure 
“number of mounts per oestrons’” were re- 
calculated from the distributions given in 
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The data... were not normally distributed 
and the variances of the different genetic 
groups were unequal. Because of these 
characteristics, conventional parametric anal- 
ysis was not feasible. Therefore only... 
non-parametric statistics were employed in 
the analysis (Goy & Jakway, 1959, p. 143). 


However, in the case of Measures 2, 
3, and 4 above, distributions in the 
form of proportions are given which 
has enabled rescaling as necessary. 
The methods used with the male 
guinea pigs in rearing, testing, and 
scoring their sexual behavior are de- 
scribed by Jakway (1959) as follows: 


The animals were left with their own dams 
and siblings until weaning on day 25. They 
were then placed in individual cages 2 ft. X2 
ft.X1 ft. with two females of their own age. 
The caging in such groups assured each 
male of the contact with other animals which 
is necessary to bring out the behavioural dif- 
ferences between males of the two inbred 
strains. On day 73 the female cagemates 
were removed. Between the ages of 77 and 120 
days each animal was observed in seven, ap- 
proximately weekly, 10-minute tests with 
oestrus females. The mean score from this 
number of observations is expressive of the 
mating performance of a given animal. Ele- 
ments or measures of sexual behaviour . . . are 
defined as follows: Circling is the term em- 
ployed when the male circles the female. 
Sniffing and nibbling is recorded each time the 
nose of the male touches the female other than 
in the anogenital region. Nuzaling is re- 
corded when the nose touches the anogenital 
region of the female. Mounting is scored when 
the male places both forepaws on the female. 
Intromission is recorded when the penis pene- 
trates the vaginal orifice. This is accompanied 
by rhythmic pelvic thrusts. Ejaculation is 
accompanied by a convulsive contraction of 
the haunches and terminates the display of 
sexual behaviour. A test score is a numerical 
value reflecting three factors: the interval of 
ejaculation (latency of ejaculation), the 
amount of sexually oriented activity, and the 
maturity level of the behaviour. With the 
exception of circling which is not scored, each 
measure is given a numerical value from the 


Goy and Jakway’s Table 3. Our values are 
in general agreement with those given in their 
Table 1, taking account of distortion due to 
grouping, with the exception of Strain 13. 
The obviously erroneous value given in the 
last column of their Table 3 may be responsi- 
ble for the discrepancy. 
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lowest for sniffing and nibbling to the highest 
for ejaculation. The value of each is then 
multiplied by a factor expressive of latency 
of ejaculation; the shorter the latency, the 
higher the factor. Since most tests in which 
ejaculation occurred were terminated before 
the end of the tenth minute, measurements 
other than scores will be expressed as rates/15 
seconds. Inasmuch as a sexual behaviour score 
can be attained in several ways, the compo- 
nents were analysed separately for possible 
patterns of inheritance (p. 151). 

The means and standard errors for 
each component and the composite 
score are grouped together in Table 5. 
In each case Jakway gives the per- 
centage distributions which has en- 
abled us to rescale the data as neces- 
sary. 

The results of the joint scaling 
tests are summarized in Table 6. 
Only two measures, duration of maxi- 
mum lordosis in females and number 
of ejaculations in the males show genic 
interactions on the chosen scales. All 
the measures show significant herit- 
able variation and only one, circling 
in males shows no significant dom- 
inance. 

For the four female measures the 
potence ratio [h]/[d] is approxi- 
mately half and for three of the meas- 
ures, latency of estrus, duration of 
maximum lordosis, and frequency of 
mounting it is also negative. That is, 
for these measures the parent with 
the lower score contains a prepon- 
derance of dominant genes. For the 
male measures the potence ratios are 
more variable, ranging from non- 
significant for circling to greater than 
10 for number of ejaculations; in fact 
for three measures, nuzzling, intro- 
missions, ejaculations, as well as for 
the composite score, the ratio is 
greater than one, that is, all these 
measures show heterosis. 

Estimates of the components of the 
generation means and their standard 
errors showed that the [j]-type in- 
teraction is primarily responsible for 
the significant deviation for additiv- 
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ity in duration of maximum lordosis 
with a value of 6.342.4, while it is 
the [l]-type interaction which gives 
the same effect in the measure num- 
ber of ejaculations (6.0 +2.0). Tests 
of inhomogeneity of the Pi, Ps, and 
F, variances show that three of the 
female measures, latency of estrus, 
duration of maximum lordosis and fre- 
quency of mounting and two male 
measures, circling and nussling, ex- 
hibit significant genotype-environ- 
mental interactions. In all, therefore, 
the data from six measures would re- 
quire rescaling to remove either genic 
or genotype-environmental interac- 
tion in order to proceed with the anal- 
ysis of the second degree statistics. 
Unfortunately, the observation on 
latency of estrus in the females are not 
presented in a manner which allows 
rescaling. For the other measures 
both square root and log transforma- 
tions were made and the latter in all 
cases removed both sources of non- 
independence. Thus, for the two 
measures showing genic interaction 
the joint scaling tests on the log 
transformed data gave values of 
[d] =0.138 +0.014, [h] =0.030 +0.028 
for duration of maximum lordosis, and 
{d] =0.012 +0.009, [h] =0.126+ 0.018 
for number of ejaculations both of 
which now gave satisfactory fits with 
the observed data. A solution for D, 
H, and E, was therefore attempted 
from the second degree statistics 0 
all the measures on all scales. These 
estimates proved to be disappointing. 
Only two measures, frequency of 
mounting in the females and intromts- 
sions in the males gave evidence of 
segregation in the Fz and backcross 
ations on any of the three scales 


gener 
employed—in the former case, on two 
of them. That is, only for these two 


measures were the magnitudes of 
Vra and Vz. t Ve, greater than the 
estimates of their environmental com- 
ponent (E1 and 2Fi, respectively) ob- 
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TABLE 6 


Jakway AND Goy's DATA: COMPONENTS OF MEANS AND THEIR STAND- 
ARD ERRORS FROM JOINT SCALING TESTS, AND HERITABILITIES 


eas Components of means Tatéractian Mane 
{al th) fie (%) 

Females 

Latency of estrus (hours) 1.2 +0.1 —0.88 +0.21 

Duration of estrus (hours) 1.6 +0.2 0.99+0.30 

Duration of maximum 

lordosis (seconds) 7.3 +40.2 |—3.17+1.16 

Frequency of mounting 8.7 +40.5 |—4.71+0.88 
se 

Cireling/15 seconds 0.18+0.06 |—0.17+0.18 

Nuzzling/15 seconds 0.10+0.02 |—0.16+0.04 

Mounting/15 seconds 0.2 +0.02 |—0.12+0.03 

Intromissions/15 seconds | 0.01 +0.005 0.09+0.01 

Ejaculations in 7 tests 0.34+0.18 

Sexual behavior score 0.98 +£0.13 


tained from the parental and Fi vari- 
ances. The results of these estima- 
tions are given in Table 7. 

Estimates of heritability are there- 


mounting and intromissions and these 


fore only possible for frequency of 


and 60% (see Table 
number of 
is also confined 
ther of 
which give values greater than one, 
although it is quite clear that more 


lie between 50% 
7). An estimate of the 
effective factors (K) c 
to these two measures ner 


TABLE 7 
Jaxway AND Goy’s DATA: COMPONENTS OF VARIATION AND HERITABILITY 
Components Heritability 
D 4D+}H 
Measur Scale 
E | p+, | D+}H+E: 
(%) (%) 
50.0 
Frequency of mount- | Linear 16.68 | 61.4 
See Log 0.088 53.5 56.0 
Bag)” A 
Tntromission Linear | 0.0014 | 0.0016 | 0.0011 56.0 50.0 
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than one gene must be involved in 
the inheritance of the characters 
which show genic interaction. To ob- 
tain some idea of the heritability of 
the remaining measures a first degree 
equivalent of one of our estimates 
has been evaluated, namely, 
{d]/({d]+e) where e is the mean 
standard error derived from the sam- 
pling errors of the generation means 
as follows:* 


e= J (Virt Vrt Vrt Vit Vi,+Vi,)/6 


and the results are indicated in Table 
6. 

The failure of the second degree 
statistics to show even evidence of 
segregation for 8 of the 10 measures 
analyzed requires further comment. 
A number of contributory factors are 
present which might reduce or bias 
our estimates of these statistics as de- 
drived from our transformations of 
the data as published. These include: 
(a) grouping of the data into as few 
as five classes, (b) the use of metrics 
which place over 50% of the in- 
dividuals into the zero class, and 
(c) grouping all scores higher than an 
upper limit set by the higher parent 
or F; into one class which may con- 
tain 30% of the individuals of a 
segregating generation. 

The exact consequences of these 
procedures are difficult to ascertain 
but it is clear that they lead to scalar 
problems which have not been re- 
solved by either square root or log 
transformations, and they could 
easily reduce the variances of the 
segregating generations (Fo, Bı, and 


° This formula is only applicable in this 
case because of the demonstrated absence 
of heritable variation in the segregating Fs and 
backcross generations. These may therefore 
be treated as equivalent to the parental and 
Fı generations in displaying only environ- 
mentally induced variation, although, of 
course, they would normally be excluded from 
this type of estimate. 
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B+) to those for the nonseg 
generations (Pi, P:, and F;). 

Our conclusions are therefore ne 
essarily drawn mainly from thea 
ysis of the first degree statistics 
are substantially in agreement with 
those of Goy and Jakway (1959) and 
Jakway (1959). There are, however, 
differences in detail. To take one | 
example; they make no allowance for 
the possibility of genic interaction 
which is unambiguously present in 
two of the measures on the original 
scale. In consequence, they consider 
that maximum lordosis in females is 
under the control of a single genetic 
factor without dominance; an inter- 
pretation which is difficult to uphold 
in view of our demonstration of- 
significant [j]-type nonallelic inter- 
actions. 


Scott 


We can now turn to the analysis of 
less complete sets of data, the most 
recent among which is that of Scott 
(1954). This is one small segment of - 
that which has been collected at Ba 
Harbor on the performance of five 
thoroughbred strains of dogs. Th 
two strains for which crossbred da 
have so far been reported are th 
cocker spaniel and the African basenji_ 
or barkless dog. The breeding pro 
gram used is described as follows: ` 


It was found that all of the breeds show 
great deal of variability, a large part of whic 
appeared to be hereditary since offspring of - 
different matings gave different results. In 
order to reduce this variability somewhat, th d 
animals chosen from the parent strains for the 
crossbreeding experiment were descen 
from one brotherXsister mating in 

basenji’s and from two matings of a si 
male with his sister and mother in the case 
the cocker spaniels. No selection of these 
dividuals was used except that the original 
pairs were vigorous and healthy animals. As 
turned out later these did not necessarily 
illustrate the extremes of either breed in all 
characteristics. Reciprocal crosses were ™ 
between these two groups of siblings and 
effort was made to obtain at least four 
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ferent pairs in each case, giving two F, popu- 
lations, F, males were backcrossed to the 
mothers so that backcross and F, animals 
raised by the same mothers could be 
Finally, F, populations are being obtained 
from both crosses (Scott 1954, p. 745). 


The subjects were reared in a care- 
fully standardized manner as part 
of the program, and subjected to a 
battery of psychological and phys- 
iological tests at various predeter- 
mined stages in their life history. 
Scott (1954) gives the results of 
analyses involving nine different 
measures, but only in one case is the 
grouped data for the distributions of 
the scores given, thus enabling cal- 
culations of the approximate means 
and variances for the various genera- 
tions. These were the scores derived 
from a barrier test given the pups at 
the age of 6 weeks on 2 days. The 
task is to seek the way round a barrier 
to reach the experimenter and food. 

The F, data have not yet been 
published, so that the results to be 
found in Table 8 relate to the pa- 
rental, Fı and backcross lines only. 
With both backcross means higher 
than either of the parental or Fi 
means, no scale is possible on which 
genic interaction is absent. It is not 
surprising, therefore, to find signif- 
icant deviations from additivity of 
gene action on the two available 
scaling tests, the A and B tests 
(p=0.01 and 0.05, respectively). Nor 
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is this situation improved by a square 
root or a log transformation. 

In the absence of the F, generation 
mean we can only estimate [d] and 
[j] among the components of the 
means. In such a situation, we can, 
however, obtain estimates of various 
compounds of the remaining com- 
ponents. Thus, 


lij- [j] =F.-4(P. +P) 
lij+[1) =P,+P:+2F,-25, -4B 
(bh) +(1]=4P,+4P.+3F,—2B,—2B, 


Scott's data give the following 
values for these components on the 
linear scale: 


[d]= 1,743.9 
[jJ=- 4.9484 
(h]—[iJ=- 1341.9 
[iJ+ [1] = —30.548.6 
[h]+[1]=—31.848.6 


Only {i]+[l] and [h]+[l] are signif- 
icant; therefore the only certain fea- 
ture of these data is the presence of 
nonallelic interactions. a 

From the second d statistics 
we can estimate only D+H, >(dh), 
and E, but we cannot place much 
reliance on their values in the pres- 
ence of both unscalable nonallelic in- 
teractions and slight genotype-en- 
vironmental interaction (p=0.05). 


TABLE 8 
Scort’s DATA: MEANS AND THEIR STANDARD ERRORS IN NuMBER OF ERRORS AND (7) 


Generation Means 


Pi Pics 
Basenji | Cocker Spaniel 
De rc Pace tami tori ae in 
Actual 
Parents | 2.5+1.7 (16) | 10.8+2.3 (26) 


Total 
Population | 3.2 +1.0 (39) | 12.541.7 (49) 


B: B: 


1 
ar 
5.441.0 (41) 12.8+2.7 (27) | 14.5£2.9 (23) 


[_—_ 
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A square root transformation re- 
moves the latter without, as has been 
noted, removing the genic interaction. 
On this scale: 


D+H=0.329 
(dh) = 0.008 


and E,=0.084 


Although we cannot separate D 
and H, we can estimate the likely 
order of heritability by putting D=H 
and H=0 in turn which lead to es- 
timates of 66% and 80%, respec- 
tively. The value of (dh) provides 
no evidence of dominance and if the 
genes show dominance the dominant 
alleles must be equally distributed 
between the two parents. With a 
nonsignificant estimate of [d], on all 
scales tried, no estimate of the num- 
ber of effective factors has been at- 
tempted. The presence of genic in- 
teractions, however, shows that a 
number of genes are involved, 


Tryon 


A further set of data are provided 
by Tryon (1929, 1940, 1942), whose 
study of selective breeding for ‘‘maze- 
brightness” and “maze-dullness” in 
tats is probably the best known in 
the whole of psychogenetics. He se- 
lected through 22 generations of 
brother Xsister mating for high and 
low error scores in a 17-unit auto- 
matic maze in which the rats were 
trained for 19 trials to run to a food 
reward. He claims that 


Rigorous environmental controls were effected 
(1) by instituting standard procedure of ani- 


P. L. BROADHURST AND J. L. JINKS 


mal care and of breeding, (2) by using an 
automatic mechanical device for delivering 
the animals into the maze without handling, 
and (3) employing an electric recorder for the 
scoring of each rat's maze run (Tryon, 1940, 
p. 112). 


Elsewhere Tryon (1931) details the 
husbandry and comments as follows: 


Very special efforts were made to keep am- 
bient influences the same for all the cages in 
which these animals lived before they learned 
the maze (Tryon, 1929). Each animal lived 
with its siblings until shortly after weaning 
time (30 days), when it was numbered by 
punching its ears. Then it was placed with 4 
animals from other litters in a cage in which it 
lived until it ran the maze. Each living cage 
possessed an ever present supply of food and 
water. All cages were cleaned at the same time 
and in the same manner. Even so, it would be 
naive to suppose that life withing a cage was 
identical for all animals. Any rat experimenter 
knows that social life within a cage is variable 
and complex. But it would not seem likely 
that the difference in experience of different 
rats in the same cage would to any significant 
degree cause differences in the later learning 
of the maze under the remote solitary experi- 
mental conditions (p. 316). 


A cross was made between the two 
strains developed, and the F; and F? 
generations bred, though it is not 
clear at what stage in the selection 
experiment this was done. Tryon 
(1940) only gives the results in the 
form of histograms showing the per- 
centage of subjects having a particu- 
lar error score on his “normalized” 
scale but from these it has been possi- 
ble to make approximate reconstruc- 
tions of the original distributions. 
The results obtained in this way are 
given in Table 9. 

In the absence of the backcrosses 
we can apply only the C scaling test 


TABLE 9 
Tryon's Data: MEANS AND THEIR STANDARD ERRORS In NuMBER oF ERRORS AND (1) 


Generation Means 


P; (bright) P: (dull) 


25.9+0.9 (85) 142.9+3.7 (53) 


F; F: 


63.1 +3.5 (133) 71.2+2.9 (202) 
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and this suggests that on the chosen 
scale genic interactions are absent. 
We can therefore estimate [d] and 
[h] in the manner discussed in the 
next section and which give values 
of 117.0+3.8 and —21.3+4.0, re- 
spectively. Thus we have a large 
significant additive effect and a small 
but significant dominance contribu- 
tion. 

Unfortunately, there is significant 

genotype-environmental interaction 
(p<0.01), which, in the absence of 
significance in the C scaling test, we 
will not attempt to scale out.!® Our 
analysis of the second degree statis- 
tics is thus prospectively biased. In 
any case only estimates of }D+4H 
= 646.63, and E,;=1006.01 can be ob- 
tained, giving the percentage of herit- 
able variation in the Fz population as 
39. If we assume D=H or H=0, we 
can (a) estimate the number of effec- 
tive factors as 14.1 and 10.6, and (b) 
obtain values for our other index of 
heritability of 49% and 56%, for 
these two situations, respectively. 
_ Thus the pattern of rat behavior 
investigated by Tryon is controlled 
by many genes which are additive in 
their effect but interact with the en- 
vironment. They show dominance 
and there is a preponderance of 
dominant genes for a low score. The 
heritabilities for this character are 
about average. 


Vicari 


A further set of data which omit 
the backcrosses are provided by the 
Mvestigations of Vicari (1929). These 
are the earliest psychogenetical ex- 


oe for most of the data previously 
on ered in this paper, rescaling has been 
apne because of a failure of both scaling 
ait H no attempt has been made to scale 
a e significant genotype-environmental 
N pon in this case on the grounds that 
oe rement in respect of the latter disturb- 
: might be at the expense of the satisfac- 
tY outcome of the test for additivity. 
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periments which yield any data 
amenable to analysis by the methods 
proposed here. She used four strains 
of mice which were “closely inbred” 
and derived F, and Fs generations 
from them. One of these strains, the 
Japanese waltzer, was regarded at 
the time as being of a different species 
from the other three (Mus musculus), 
and the offspring of the cross involv- 
ing it consequently hybrids. It has 
since been shown (see Griineberg, 
1952), however, that the Japanese 
waltzer is a subspecies of Mus 
musculus, and furthermore, that it 
differs from the normal in a genet- 
ically complex manner, that is, the 
waltzer condition is not due to a 
single gene difference. Vicari’s meas- 
ures were derived from a simple, two- 
choice maze in which the subjects 
ran to a food reward. No details of 
the deprivation schedule for motivat- 
ing the animals to run this maze are 
given, and it is clear from the run- 
ning times reported that the appara- 
tus itself was ill-designed for the pur- 
pose of obtaining efficient learning. 
Despite these difficulties, however, 
Vicari reports the results after 14 
trials for a substantial number of 
subjects from the four parental gen- 
erations and the three Fis and Fis 
bred from them. We can, of course, 
only apply the C scaling test, the re- 
sults of which are given along with 
the generation means in Table 10. 
Where there is a significant devia- 
tion on the scaling test only esti- 
mates of the following compounds 
and their standard errors can be ob- 


tained, AS 
[a] —4[j] =4(Pı— P») 


and ene 
{h]- {iJ= F,—3(Pit Ps) 

as before, and from Scaling Test C 

itself: 


afi]+ [= Prt Pet 2 4F 


ee 


“PAI %41 I} We JuRoYTUsK 
“Pas %S y e Imou sa 


L256's- i 
bites = (9x9) 998) [1]+ 12 s 
s'e 
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= OTFOLT | VE Phe i m yausi „i=l 
~ reevos | s g 0% F6'SL— uoljAua-ad 470095) 
p $9 0F S60 Ve OS | S'8FL OL LETC | O'S FI EL Wa BuyTeS D spuosas ur uns 
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TABLE 11 


Vicari's DATA FOR JAPANESE WALTZER X ALsıxo Cross: Means axo Tne Sraxpaso 
Errors IN Mean RUNNING TIME anv (#) 


Generation Means 


Trial 
Japanese Waltzer Cross PXP, 
P: | Fi 
1 | 113.7212.0(80) | 74.2274 (151) | 86.8211.7 
| ay 712 ote) | ariatt.2 Mas) | Toes ise 
8 52.3 10.0 (45) 47.025.9 (128 60.3 +10.0 
14 | 83.0216.8 (28) | 21.1 22.6 (119) | SI7E11.6 


* Significant at the 5% level. 
** Significant beyond the 1% level. 


But direct estimates of [d] and [h] 
can be obtained where there is no 
significant deviation on the scaling 
test by assuming [i] and [j]=0, as 
in the case of Tryon’s data above. 
Of the 12 sets of data, i.e., four 
measures recorded for each of the 
three crosses, eight show significant 
genic-interaction and eight, geno- 
type-environmental interaction on 
the chosen scales. Unfortunately we 
cannot rescale the data because 
Vicari does not give the individual 
scores on which the means and 
variances are based. She does, how- 
ever, present the distributions for one 
measure, mean running time, for dif- 
ferent stages in the experiment, the 
first, fourth, eighth and fourteenth 
trials. We shall use the cross P:X P: 
to illustrate the further analyses, the 
relevant generation means appearing 
in Table 11. The decrease in the num- 
ber of subjects in successive trials 
which is observed is attributed by 
Vicari to death, escapes, failures to 
run, etc. An analysis of variance of 
the 4X4 table, that is, four genera- 
tion means in each of four trials, 
shows a highly significant difference 
between trials (p <0.01) and a signif- 
icant difference between generations 
(p=0.05-0.01) when compared with 
the interaction mean square for gen- 
erations X trials, which has the same 


order of magnitude as the sampling 
variance of the generation means. 
The significant difference between 
generations is expected if the char- 
acter is inherited and the significant 
difference between trials, the running 
time falling steadily as the number of 
trials increases, reflects a strong train- 
ing component which is presumably 
non-heritable. Since the C scaling 
test when applied to these data de- 
tected no significant deviations due 
to genic-interaction (Table 11), we 
can estimate [d] and [h] directly from 
the generation means on the original 
scale." For the first trial {d] = 10.7 
+8.2 and [h)=—28.8+17.8 neither 
of which are significant, that is, there 
is no significant heritable component 
of the generation means. The fully 
trained performance in the fourteenth 
trial gives [d]=26.7+8.7 and 
[h]= —35.249.1. The components 
are similar in absolute and relative 
magnitudes in these two extreme 
cases but their significance is higher 
in the last trial This greater heritabil- 
ity in the last trial is supported by 
the second degree statistics. Thus in 
the first trial the percentage of herit- 
able variation in the F, population 


In ignoring the significant genotype- 


environmental interactions shown in Table 
11, we are following the argument of Foot- 


note 9. 
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is not significantly different from 
zero, while in the fourteenth trial it 
is 51%. Furthermore the fourth and 
eighth trials in this case give inter- 
mediate values of 21% and 29%, re- 
spectively. Hence heritability in- 
creases almost linearly with the num- 
ber of trials, the final performance 
being more heritable than the initial 
performance. 

Before attempting an interpreta- 
tion of this finding two points must 
be borne in mind. Firstly, the ex- 
perimenter’s skill may have im- 
proved in successive trials, thus re- 
ducing the nonheritable component 
of variation, though this is unlikely 
since the generations were probably 
not all tested at the same time, and, 
secondly, almost 30% of the animals 
scored in the first trial were missing 
in the fourteenth for a number of 
reasons. If the missing third were not 
a random sample of the original sub- 
jects a progressive bias could have 
been introduced. It is not possible to 
ascertain from the data available 
whether or not one or both of these 
factors is making a contribution to 
the observed trend. It is interesting, 
however, to note that the total varia. 
tion in the F, population remained 
constant from the first to the last 
trial and the increased heritability 
results from a drop in the percentage 
due to nonheritable agencies from 
100 to 49. It seems likely, therefore, 
that what we have detected is in 
fact a real effect and that it repre- 
sents a progressive release of the 
performance from the effect of en- 
vironmental stimuli irrelevant to it. 

This effect of a progressive increase 
in the heritability of performance in- 
dicated in Vicari’s data would seem, 

if confirmed, to have interesting 
implications, It might be related to 
the change over from general to 
specific factors known to occur in the 
acquisition of skills (Fleishman, 1957; 
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see also Wherry, 1939), and might 
also indicate a method for assessing 
the relative importance of environ- 
mental variation in learning tasks, 
The effect of the same environmental 
stimuli at different stages in a given 
task might be studied, as in Vicari’s 
situation, or at the same stage in dif- 
ferent tasks, as well as that of dif- 
ferent stimuli in either of such ar- 
rangements. As was noted earlier, 
the sort of problems which the in- 
clusion of environmental variation 
introduces into biometrical analyses 
has been discussed and possible solu- 
tions indicated (Jones & Mather, 
1958; Mather & Jones, 1958; van der 
Veen, 1959). While it is beyond the 
scope of this paper to enter into this 
matter in detail, it may be said that 
there is evidence for a genetical com- 
ponent in the determination of the 
variability of performance in such 
different environments, as opposed to 
the control of its actual expression 
in any single one of them which is 
what we have been dealing with so 
far. This variability is also suscep- 
tible to analysis by biometrical 
methods (Jinks & Mather, 1955). 
The analysis from this point of view 
of the only suitable behavioral data 
at present known to us (Broadhurst, 
1960) is not yet complete. 

Thus, while the nature of Vicari’s 
data makes our conclusions tenta- 
tive, the analyses discussed here 
have shown their advantages if only 
in indicating the complexity of the 
inheritance of the behavior patterns 
under investigation. 


Discussion 


We have presented the results of 
our reanalyses of all the data avail- 
able to us in the field of psycho- 
genetics with little reference to the 
outcome of analyses of other kinds 
performed, in some cases, by the 
authors concerned. The methods 


es 
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used have usually been those of 
classical Mendelian genetics, which, 
though basic to biometrical genetics, 
cannot satisfactorily be applied in 
their simpler forms to the analysis of 
continuously variable characteristics. 
The clarity associated with the 
Mendelian analysis of discontinuous 
variation and attributable thereby to 
the effects of two or three major genes 
is not to be expected from biomet- 
rical methods, a major assumption 
of which is that the continuous 
phenotypic variation observed is the 
product of multiple genetical and 
environmental causes, largely un- 
specifiable in detail. Sometimes the 
argument from Mendelian analysis 
has been by analogy, which is not al- 
ways illuminating and may in certain 
instances be downright misleading. 
For example, the resemblance of the 
F; to one or other of the parental lines 
does not necessarily mean the same 
thing in biometrical genetics as it 
does in the simpler cases encountered 
in Mendelian analysis. According to 
the polygenic hypothesis, the depar- 
ture of the Fı mean from the mid- 
parental value depends both on the 
balance in the parental lines of the 
dominant and recessive polygenes 
and upon the respective direction of 
their cumulative effects. Dominants 
may be increasers or decreasers, that 
is, having a positive or a negative 
phenotypic effect, respectively, as ex- 
pressed on the scale used, or there 
may be a balance between the two. 
Thus the F; mean value may be close 
to the upper parental mean on a scale 
because of a preponderance in the 
Parents of dominants with positive 
effect in terms of the metric, while an 
F; close to the lower parental value 
will result from a preponderance of 
dominant decreasers. Intermediate 
values may express different degrees 
of balance operating. Failure to 
recognize this difference between 


potence, measured in terms of [h], 
and dominance, measured as YH, 
will in general lead to spurious dis- 
agreements between the level of 
dominance in the F\s and that in the 
segregating F: and backcross genera- 
tions (e.g., Jakway, 1959, p. 155). It 
is equally misleading to regard low 
potence, that is an F, close to the 
mean parental value, as a diagnostic 
characteristic of multifactorial in- 
heritance (see Hall, 1951, p. 321). 
Genetical methods and analyses more 
complex than those that have been 
considered here, such as the analysis 
of double back-crossing (Mather, 
1949), or of diallel crosses (Broad- 
hurst, 1959, 1960; Hayman, 1954; 
Jinks, 1954), are needed to enable 
more precise estimates to be made of 
the different factors operating. In 
this connexion it should be noted that 
the analysis in turn of single crosses 
from two parental strains such as we 
have been dealing with here is a rela- 
tively inefficient and laborious meth- 
od of making a biometrical analysis 
of quantitative data. Techniques in- 
volving crossing several pure strains 
at once are superior, and of these the 
diallel cross method in which a num- 
ber of strains are intercrossed re- 
ciprocally in all possible combina- 
tions and analyzed, together with the 
parental lines, in a single diallel table 
is probably the best. The merits of 
this approach at the present time in 
psychogenetics have been argued 
elsewhere (Broadhurst, 1960) ; they 
are, briefly, that the analysis can 
proceed at the F, stage, without the 
necessity of breeding further genera- 
tions, and that the method thereby 
provides a quick survey in several 
parental strains at once of the genet- 
ical determinants of the character 
investigated. The former may be of 
particular utility in interspecies 
crosses, where hybrids are sometimes 
sterile, so precluding the possibility 
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of breeding Fys, etc. Intensive study 
of the gene differences in pairs of 
strains selected for further analysis 
in this way can then follow. 

One source of satisfaction from 
the present work has been the con- 
sistently high heritabilities obtained 
from our analyses; and by choosing 
a scale, wherever possible, on which 
interactions between genes, and be- 
tween genes and environment are 
absent, these estimates are somewhat 
more reliable as well as higher than 
those obtained by the authors of 
the papers under review. Any in- 
efficiencies in the experimental de- 
sign—e.g., the imperfect control of 
environmental variation, or the un- 
reliability of the measures used—or 
in the statistical analysis used will 
reduce the values obtained below 
their true values. While it has been 
beyond our power to influence the 
first source of inefficiency in these 
data, our analyses have improved the 
position regarding the second. And 
the collaboration between psychol- 
ogist and geneticist will ensure that 
future researches are designed with 
due care and attention to the im- 
portance of the relevant genetical 
principles (see Broadhurst, 1960). 

The choice of scale is an important 
problem in biometrical analysis (see 
Mather, 1949) and, as we have seen, 
the need for rescaling, resulting from 
interactions between the genes and 
between the genes and the environ- 
ment, arises in the inheritance of 
some 50% and 70%, respectively, of 
the measures in the examples re- 
viewed here. In no case was the 
presence of genic interaction demon- 
strated as a significant factor prior to 
the analyses undertaken for this re- 
view and in only one case were steps 
taken to eliminate the gene-environ- 
ment type of interaction by a scalar 
change (Thompson & Fuller, see 
Footnote 3). And yet our analyses 
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show that in almost all cases a simple 
log transformation is sufficient to 
eliminate both causes of interaction 
and hence provide a scale on which 
unambiguous interpretations of dom- 
inance and potence effects can be 
made. 

The gene-environment interaction 
detected by the inhomogeneity of the 
variances of the parents and F\s has 
two main causes. The major cause 
(some 60% of measures) is a correla- 
tion between mean and variance in 
the nonsegregating generations. A 
more interesting cause, however, in 
the remaining examples is the lower 
variance of the F, individuals com- 
pared with those of the parental gen- 
erations; an effect which is independ- 
ent of their means. This phenom- 
enon, which is common to the in- 
heritance of all types of characters 
and occurs equally among animals 
and plants, has received considerable 
attention of late (see Jinks & Mather, 
1955; Lerner, 1954; Mather, 1953, 
for reviews). Extensive discussion of 
this point is, however, beyond the 
scope of the present review. 

Some 70% of the genic interactions 
in these analyses are due to the [j]- 
type interactions. This has two im- 
plications. Firstly, there must be in- 
teractions between additive and dom- 
inance effects and, secondly, the inter- 
acting genes must be associated in 
the parental lines, the majority of 
increasing interacting genes being 
present in one parent and the de- 
creasers in the other. This is not an 
unexpected result when one con- 
siders that most of the experiments 
reviewed here have employed pa- 
rental lines which were chosen because 
they represented the extreme pheno- 
types immediately available or ob- 
tainable as a result of prolonged se- 
lection. 

This policy could explain two fur- 
ther features of these examples, 
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namely, the rarity of heterosis and 
the often satisfactory estimates of 
the number of genetical factors at- 
tained by a method which, for rea- 
sons mentioned earlier and discussed 
more fully by Mather (1949), have 
often failed to give sensible values in 
other work. Without going into de- 
tails it is clear that if the better 
parent in a cross already contains the 
majority of the available increasing 
genes it is unlikely to give rise to a 
superior F, irrespective of the domi- 
nance or interactive properties of the 
genes. Similarly, our estimate of the 
number of genetical factors assumes 
that the genes are associated in the 
parental lines. Failure of this as- 
sumption leads to underestimation. 
Our rather satisfactory estimates 
could, therefore, be a further indica- 
tion that the genes are so distributed 
in the parental lines. 

It is hoped that the reanalyses re- 
ported here serve as another example 
of the application of biometrical 
methods to psychological data in 
addition to those already available 
(Broadhurst, 1959, 1960). Consider- 
ing the unsatisfactory nature of much 
of the data at our disposal, it is felt 
that the outcome, in terms of the 
ease of the analyses, especially with 
regard to the search for suitable 
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scales, and the consistency of the re- 
sults obtained, has been favorable. It 
is not yet possible to pronounce on 
the general efficiency in this field of 
the methods advocated by us. Fur- 
ther proof of their suitability will only 
come when they have been applied 
more widely to data gathered from 
suitably designed experiments, per- 
haps along the lines indicated, so that 
replication of the genetical picture be- 
comes possible, thus enabling some 
specification to be made of the gen- 
erality of the determinants of a par- 
ticular behavioral characteristic. 


SUMMARY 


The techniques which can be used 
in the analysis of quantitative data 
by the methods of biometrical genet- 
ics were outlined and the importance 
of achieving a suitable scale noted. 
The body of the paper consists of 
descriptions of experiments in psy- 
chogenetics which lend themselves to 
this type of analysis, and a presenta- 
tion of the results of our reanalyses 
of the data they provide in terms of 
additive, dominance and interaction 
components of variation. We con- 
clude that, despite the unsuitable na- 
ture of some of the available data, 
the outcome indicates the utility of 
the biometrical approach. 
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The same forces which have char- Current TRENDS IN AUTOMATED 
acterized the evolution of general TEACHING MACHINES - areg 
educational practices are inherent to c ae a 
- ` urrent interest in the area of 
the history of the new science of auto- di ted teaching machines $s a 


mated teaching. As a result of the 
expansion and multiplying complex- 
ities of political, economic, and social 
interests, there developed an ever in- 
creasing need for the rapid education 
of large numbers of people. New edu- 
cational objectives demanded new 
methods of instruction, and the his- 
tory of education is marked by many 
diverse attempts at establishing more 
efficient teaching procedures. Once 
again teaching methods must be re- 
evaluated. Rigid adherence to the 
principle of personal teacher-student 
relationships no longer seems feas- 
ible—an instructional system more 
appropriate for present-day needs 
must be established. It is probable 
that the use of automated teaching 
devices can fill this need in the method 
of education. As Corrigan (1959) has 
suggested: 

the automated teaching method has grown 
out of a pressing need. This need has been 
created by a twofold technical training prob- 
lem. As advances in science and 

have been made, there has been an ever in- 
creasing demand for well-trained instructors; 
at the same time the availability of these 
trained persons has been diminishing. This 
situation is aggravated further by the in- 
creased scope and complexity of subjects, and 
the ever increasing ratio between number of 
instructors and students (p. 24). 

* The research reported in this document 
Was supported by the Department of the Air 
Force under Air Force Contract AF-33- 
(600)39852. 

2 The author wishes to acknowledge the val- 
uable editorial assistance of Sylvia Pilsucki. 


illustrated by the simple index 


1 i 
that for the years prior to 1948 the i 
are only 6 references, w =i 
through 1959 ne ees more than 
50 reports publi P 

The grandfather of automated 

teaching machines is Ee 
Pressey (1926, 1927), who des ME 
machines for automated teaching 
during the mid-1920e. TS a 
vice described 


the American a 
tion (APA) meetings in Hey a 


proved device was x 
1925 at the APA meetings. pal 


correct answer, the machine advanced 

to the next item. If his response was 

incorrect, the machine scored an error 

and did not advance to the next item 

antl the correct answer was chosen. 

The capacity of the drum was 30- 
two-line typewritten items; the paper 

on which the questions appeared was 

carried as in a typewriter. 
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In 1927, Pressey summarized his 
efforts as follows: 
The paper reports an effort to develop an ap- 
paratus for teaching drill material which (a) 
should keep each question or problem before 
the learner until he finds the correct answer, 
(b) should inform him at once regarding the 
correctness of each response he makes, (c) 
should continue to put the subject through the 
series of questions until the entire lesson has 
been learned, but (d) should eliminate each 
question from consideration as the correct 
answer for it has been mastered (p. 552). 


In 1930, Peterson devised a self- 
scoring, immediate feedback device. 
The Chemo Card, as this device was 
later called, utilized the technique of 
multiple choice. A special ink was 
used by the student in marking his 
answer. The mark appeared red if 
the answer was incorrect; a dark 
color resulted if the answer was cor- 
rect. Although Pressey’s notions and 
the Chemo Card might have stimu- 
lated an interest in automated teach- 

- ing techniques in the twenties, edu- 
cators and researchers obviously were 
not at that time ready for this ad- 
vanced concept of teaching. Auto- 
mated teaching did not take hold. 

In 1932, Pressey published an 
article describing a kind of answer 
sheet which could be scored by an 
automatic scoring device. This ap- 
paratus recorded errors by item, and 
thus provided the instructor with 
clues as to what questions needed 
further instruction. In 1934, Little 
experimented with this device as well 
as with the device originated by 
Pressey in 1926. His results favored 
the use of automated devices in 
contrast to regular classroom tech- 
niques. 

The next appearance of automated 
teaching literature came a consider- 
able number of years later. During 
World War II, the Automatic Rater 
was used by the Navy for training. 
This device projected a question on 
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a small screen; the subject's response 
consisted of pushing one of five but- 
tons. 

In 1950, Pressey described a new 
automated device called the Punch- 
board. Multiple-choice questions 
were presented to the student. The 
key answer sheet inside the Punch- 
board contained holes opposite the 
correct answers only. If the answer 
was correct, the student's pencil 
penetrated deeply; if incorrect, the 
pencil did not penetrate the paper 
significantly. Angell and Troyer in 
1948 and Angell in 194° reported the 
results of using the Punchboard. 
Both studies suggested the superior- 
ity of this method over traditional 
classroom procedures. 

In 1954, Skinner published “The 
Science of Learning and the Art of 
Teaching,” which provided the basis 
for the development of his teaching 
machines. In this article, he stressed 
the importance of reinforcement in 
teaching and suggested teaching ma- 
chines as a method of providing this 
needed reinforcement for the learner. 

Reports concerning the Subject- 
Matter Trainer began to appear in 
1955 (Besnard, Briggs, Mursch, & 
Walker, 1955; Besnard, Briggs, & 
Walker, 1955). This electromechan- 
ical device is a large multiple-choice 
machine used essentially for training 
and testing in the identification of 
components and in general verbal 
subject matter. Extensive research 
has been done with this device be- 
cause of its considerable flexibility, 
i.e., it allows several modes of opera- 
tion for self-instruction: variety © 
programed subject matter, drop-out 
feature after items have been mas- 
tered, etc. 

The Pull-Tab, used experimentally 
by Bryan and Rigney in 1956, was 4 
device in which the subject receive 
not only a “right” or “wrong” indica- 
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tion after his choice but also a some- 
what detailed explanation of “why” 
a response was incorrect. In 1949, 
Briggs had found in experimenting 
with the Punchboard that learning is 
significantly enhanced by immediate 
knowledge of results. Bryan and 
Rigney's data illustrated that the 
combination of immediate knowledge 
of results plus explanation, if the 
student is in error, produced signifi- 
cantly higher scores on a criterion 
test than if no explanation had been 
given. The importance of this re- 
search from a historical point of view 
is that it investigated immediate 
knowledge of results as a factor exist- 
ing on a continuum with varying 
degrees of effect. Up to this point any 
comparison involving the effective- 
ness of teaching machines had been 
one between classroom instruction 
and the “new” machine under con- 
sideration. In Briggs’ and in Bryan 
and Rigney’s research, however, we 
see the beginning of a concern, to 
become greater in the next few years, 
with the possible effects of specific 
variables and their interactions on 
learning. 

The years 1957-58 mark the begin- 
ing of the period in which resurgent 
interest in teaching machines was 
initiated. Ramo’s arguments (1957) 
reopened the consideration of auto- 
mated techniques for classroom use. 
His article served as one of the more 
forceful attempts to alert educators 
to the needs and requirements for 
automated techniques in education. 
Skinner's continued interest (1958) 
served as the major catalyst in this 
area. In his article, he reviewed 
earlier attempts to stimulate interest 
in teaching machines and further ex- 
plained that the learning process was 
now better understood and that this 
increased sophistication would be 
reflected in teaching machine tech- 
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nology. Skinner suggested that | 
most appropriate teaching = 
would be that which permits the ~ 
student to compose his response rather 
than to select it from a set of alterna- 
tives. On the basis of this philosophy 
and in conjunction with other prin- 
ciples of learning theory to which 
Skinner adheres, he designed a teach- 
ing machine with the following char- 
acteristics. The questions, printed on 
a disk, are presented to the student 
through a window. The student's 
response is written on a paper tape, 
which is advanced under a trans 
parent cover when the student lifts 
a lever. At this point the correct 
answer appears in the window. If the 
student is correct, he activates the 
lever in one manner, which eliminates 
the item from the next sequence, If 
he is incorrect, the lever is 
in a different manner, thus retaining 
the item in the next sequence. 
Holland (1960), a co-worker of“ 
Skinner's, has su; several well- 
known learning principles that should 
be applied to teaching machine tech- 
nology: immediate reinforcement for 
correct answers is a must, learned be- 
havior is possible only when it is 
emitted and reinforced, gradual pro- 
gression (i.e., small steps in learning 
sequences and reducing wrong an- 
swers) is necessary to estab’ com- 
plex repertoires, gradual withdrawal 
(fading or vanishing) of stimulus sup- 


imination, the student should write 
his The Skinner machine 


outline so cl is e 
cardboard folder containing mimeo- 
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graphed material which is presented 
one line at a time. The student, after 
writing his response on a separate 
sheet of paper, advances the paper in 
the mask, thereby exposing the cor- 
rect response. 

In 1958, a number of investigators 
interested in teaching machines rec- 
ommended that the programed mate- 
rial be a function of the student's re- 
sponse. This idea suggests that a 
“‘wrong’’ response may not necessar- 
ily be negative reinforcement and 
that both the “right” and “wrong” 
responses should modify the program. 
Rath and Anderson (1958) and Rath, 
Anderson, and Brainerd (1959) have 
suggested the use of a digital com- 
puter which automatically adjusts 
problem difficulty as a function of the 
response. Crowder’s (1958, 1959a, 
1959b) concept of “intrinsic program- 
ming” permits the response to alter 
the programing sequence. 

During the last few years, re- 
searchers have been focusing their 
attention on investigating many of 
the variables which are pertinent to 
the design and use of teaching ma- 
chines. The seemingly simple task of 
defining a teaching machine has been 
a serious problem to many authors 
(Day, 1959; Silberman, 1959; 
Weimer, 1958). Some definitions 
have made more extensive demands 
on teaching devices than others. 
Learning theorists (Kendler, 1959; 
Porter, 1958; Skinner, 1957; Spence, 
1959; Zeaman, 1959) are now most 
outspoken concerning the application 
of theoretical concepts to teaching 
machine technology. Transfer of 
training, mediational processes, rein- 
forcement, motivation, conditioning, 
symbolic processes, and language 
structure are but a few of these areas 
of interest. 

There are indeed many other vari- 

ables about which there is a diver- 
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gence of opinion and about which 
experimental evidence is completely 
lacking or controversial. The reports 
of Skinner (1958), Israel (1958), 
Coulson and Silberman (1960), Fry 
(1959), and Stephens (1953) are all 
focused, at least in part, on questions 
related to response modes, e.g., mul- 
tiple choice, construction of the re- 
sponse, responses with reinforcement, 
etc. Briggs, Plashinski, and Jones 
(1955) investigated self-paced vs. 
automatically paced machines. The 
importance of motivation in connec- 
tion with teaching machines has been 
explored by Holland (unpublished), 
Mayer and Westfield (1958), and 
Mager (1959). 

Essentially, the history of auto- 
mated teaching is short—it started in 
the mid-twenties and was strenuously 
reactivated by the appearance of 
Skinner’s 1958 article. Empirical 
investigations of many important 
issues in this field are just now begin- 
ning to appear. However, the neces- 
sity of developing automated teach- 
ing methods has been evident for 
many years. 


GENERAL PROBLEM AREAS 

Definition 

As in any new field, the first prob- 
lem is one of definition. What is a 
teaching machine? Silberman (1959) 
says that a teaching device consists 
of four units: an input unit, an output 
unit, a storage unit, and a control 
unit. As such, this definition includes 
a broad category of devices, from the 
most simple to the most complex. 
Weimer (1958) goes beyond the de- 
vice itself, stating that a teaching 
machine must present information to 
the student as well as test the student 
by means of a controlled feedback 
loop. Crowder (1960) insists that a 
teaching machine 
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must in some way incorporate two-way com- 
That is, the student must re- 


munication. 

spond to the information presented by the 
machine, and the machine must in turn recog- 
nize the nature of the student’s response and 


behave appropriately (p. 12). 


Perhaps the most inclusive definition 
is one given by Day (1959): 

A teaching machine is a mechanical device de- 
signed to present a particular body of informa- 
tion to the student.... Teaching machines 
differ from all other teaching devices and aids 
in that they require the active participation 
of the learner at every step (p. 591). 


Although the emphasis in some of the 
above concepts is different, together 
they give a rather complete descrip- 
tion and, if you will, definition. 


Programing 


The programing of subject matter 

for teaching machines is the most ex- 
tensive and difficult problem in this 
new technology. Beck (1959) de- 
scribes specific concepts which he 
thinks appropriate for programing a 
Skinner-type machine: 
A student’s responses may be restricted and 
guided in a great number of ways. These 
range from all types of hints... to simply 
presenting the response which it is desired a 
student acquire (p. 55). 


Carr (1959) discusses in some detail 
the importance of programing in 
terms of learning efficiency and reten- 
tion. Much of what he says remains 
open for empirical verification. Roth- 
kopf (1960) has suggested that the 
development of programed instruc- 
tion suffers from two difficulties: a 
weak rational basis for program writ- 
ing and inadequate subject-matter 
knowledge among program writers. 
The extent to which any initial 
program needs revision is perhaps ex- 
emplified by the program in Har- 
vard’s course Natural Sciences 114. 
Holland points out that the first 
Program of materials included 48 
disks, each containing 29 frames, 
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whereas a revision and extension of 
the program the following year in- 
cluded 60 disks of 29 frames each. 
Holland's objective was to extend the 
program and decrease the number of 
student errors. Crowder’s (1960) 
programing objectives are different 
from Holland's. He states: 

By means of “intrinsic programming” it [the 
program] recognizes student errors as they oc- 
cur and corrects them before they can impede 
understanding of subsequent material or ad- 
versely affect motivation (p. 12). 


Crowder considers it almost impos- 
sible to write a program which com- 
pletely avoids error, and therefore 
he structures the program require- 
ments on the probability of error. 
When an error is made, the next pres- 
entation explains the subject's mis- 
take. Depending on the nature of the 
error and when it occurs, the subject 
may either return to the original 
question or enter a program of cor- 
rectional material. 

Another concept for programing 
is known as branching (Bryan & 
Rigney, 1959). Through branching, 
many possible routes are provided 
through which the subject can pro- 
ceed, depending on the response. The 
subjects are allowed to skip certain 
material if they have demonstrated 
a knowledge of it. One study (Coul- 
son & Silberman, 1960) suggests that 
under branching conditions subjects 
require less training time than under 
nonbranching conditions; however, 
results on the criterion test were not 
significantly different. 

For certain kinds of subject matter, 
vanishing is still another concept for 
programing (Skinner, 1958). A com- 
plete or nearly complete stimulus is 
presented to the subject. Subsequent 
frames gradually omit part of the 
stimulus until all of it is removed. 
The subject is then required to recon- 
struct the stimulus. 


368 


To program verbal learning se- 
quences, Homme and Glaser (1959) 
suggest the Ruleg. With this method, 
the written program states a rule and 
provides examples for this rule. In 
each case, either the rule or the ex- 
ample is incomplete, requiring the 
subject to complete it. 

In a recent study Silverman 
(1960b) investigated methods of pre- 
senting verbal material for use in 
teaching machines. He recommended 
that further research involving the 
design and use of teaching machines 
should take into consideration the 
possible use of context cues as a means 
of facilitating serial rote learning. At 
the same time, however, he stated 
that continuous use of context cues 
as ancillary prompts should be 
avoided, since such prompts can 
interfere with learning. 

The optimum size of steps and the 
organization of the programed ma- 
terial are two formidable problems. 
Skinner (1958) states: 

Each step must be so small that it can always 
be taken, yet in taking it the student moves 


somewhat closer to fully competent behavior 
(p. 2). 
In order to determine the value of 
steps in a program, Gavurin and 
Donahue (1961) investigated the ef- 
fects of the organization of the pro- 
gramed material on retention and 
rate of learning. They state that the 
assumption that optimum teaching 
machine programs are those in which 
items are presented in a logical se- 
quence has been validated for acquisi- 
tion but not retention. The results of 
a study carried out by Coulson and 
Silberman (1959) indicated that small 
_ steps were more time consuming but 
resulted in statistically significant 
higher test scores on one of the cri- 
terion tests. Pressey (1959) in prin- 
ciple disagrees with Skinner’s no- 
tions of short and easy steps, and he 
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strongly suggests an experime: 
vestigation of this question. 
rate of learning and retention 
or recognition) are of critical conce 

The above discussion suggests s 
eral areas which are directly ap 
cable to programing and which 
under investigation and/or need 
ther experimentation. Indeed, 
are a number of unanswered questio 
in the programing complex, some of 
which have been suggested by Gal 
anter (1959): a 

1. What is the correct order of 
presentation of material? 

2. Is there an optimum number of 
errors that should be made? ‘ 

3. How far apart (in some sense) 
should adjacent items be spaced? 

4. Is experimentally controlled 
pacing more effective (in some sense) 
than self-pacing? 

5. Is one program equally effective 
for all students? 

6. What are the effects of 
different programing  techniq' 
(branching, intrinsic programini 
vanishing) in various subject-matt 
areas? 

7. What criteria are most app 
priate in the evaluation of stude 
learning? 

These questions are but a few 
the intriguing and complex proble 
facing investigators in the new 
of programing material for teac 
machines. Answers to these questi 
will help not only the educator bi 
also the engineer who is concern 
with writing adequate specificat 
for the construction of teaching 
chines. 


Response Mode 


The kind of response that shou 
be given by a subject has bee 
controversial question in the teac 
machine field. Pressey’s original m 
chine (1926) required the subjec 
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press a lever corresponding to his 
choice of answer. The format of the 
answers was multiple-choice. Skinner 
(1958) emphasized the necessity of 
having the subject compose (con- 
struct) the response. Skinner states: 


One reason for this is that we want him to 
recall rather than recognize—to make a re- 
sponse as well as see that it is right. Another 
reason is that effective multiple-choice mate- 
rial must contain plausible wrong responses, 
which are out of place in the delicate process of 
“shaping” behavior because they strengthen 
unwanted forms (p. 2). 


Coulson and Silberman (1960) in- 
vestigated this question of multiple- 
choice vs. constructed response by 
using simulated teaching machines— 
human beings were used instead of 
automatic control mechanisms. Their 
results indicated that the multiple- 
choice response mode required sig- 
nificantly less time than the con- 
structed response mode and that no 
significant difference was obtained 
between response modes on the cri- 
terion test. Further, they reported 
that no significant differences were 
obtained among the experimental 
groups on the multiple-choice cri- 
terion subtest or on the total (mul- 
tiple-choice plus constructed re- 
sponse) criterion test. Fry (1959) has 
discussed this response-mode ques- 
tion along with other variables, and 
he has carried out extensive research 
concerning constructed vs. mul- 
tiple-choice response modes. The re- 
sults of his study favor the use of con- 
structed response when recall is the 
objective of the learning. 

In addition to the basic contro- 
versy (which needs much more inves- 
tigation) between multiple-choice and 
constructed responses, there are sev- 
eral “variations on the theme” which 
are evident. Stephens (1953) has 
recommended that every wrong an- 
swer in a multiple-choice question 


appear as a correct choice for another 
item. He calls this program “inside 
alternatives.” His data indicate that 
there was no difference between con- 
trol and experimental groups on a 
criterion test using either nonsense 
syllables or Russian unless each right 
choice appeared as a wrong alterna 
tive for the three subsequent items. 
The use of prompts in general has 
been shown to be an effective tech- 
nique in automated teaching (Cook, 
1958; Cook & Kendler, 1956; Cook 
& Spitzer, 1960). 

Using learning booklets, Goldbeck 
(1960) investigated the effect of re- 
sponse mode and learning material 
difficulty on automated 
The three response modes used were: 
overt response (the subject was re- 
quired to construct a written re- 
sponse), covert response (the subject 
was permitted to think of a response), 
and implicit response (the subject 
read the response which was under- 


implicit (reading) condition 

significantly more Drag eee than the 
overt response con covert response 
condition fell between the other conditions in 


learning efficiency (p. 25). 


Concerning quiz-score results, the 
overt response group 


Goldbeck concludes that 
is cast upon the assumption that the 
ar eee is achieved by use of easy items 
and requiring written responses 
(p. 26). 
To the author's knowledge, the use 
of an oral response in conjun 
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with the Skinner teaching machine 
and its effect on learning rate and 
retention have not been reported in 
the literature. Furthermore, the im- 
portance of response mode as a func- 
tion of reinforcement must be speci- 
fied. Israel (1958) has suggested that 
natural and artificial reinforcement 
may affect the subjects’ learning. A 
most comprehensive analysis of re- 
sponse-mode and feedback factors 
has been reported by Goldbeck and 
Briggs (1960). 

The general area of reinforcement 
suggests problems related to the 
drop-out feature of teaching ma- 
chines. Pressey’s (1927) original ma- 
chine dropped items after the correct 
answer had been given twice. Skin- 
ner’s machines at the Harvard Psy- 
chological Laboratory also have the 
drop-out feature, although the com- 
mercially available machines based 
on Skinner’s design do not incorpo- 
rate this feature. With reference to a 
study carried out at Harvard, Hol- 
land (unpublished) reported signifi- 
cantly superior performance when the 
drop-out feature was used. 

If items are dropped, the sequence 
of items is of course changed. How 
important is the sequence? If items 
should be dropped, by what criterion 
of learning can one justify omitting 
an item from the sequence? If items 
are not dropped and the criterion for 
the learning procedure is a complete 
run (i.e., once through the sequence 
without error), what is the effect upon 
retention? Being correct is positive 
reinforcement; thus, some items un- 
der these circumstances will receive a 
greater amount of positive reinforce- 
ment than others. What would be 
the effect of additional reinforcements 
with or without drop-out? Again, a 

plethora of problems and a paucity of 
answers! 

Response time, another important 
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variable, has been investigated by 
Briggs, Plashinski, and Jones (1955), 
Their study suggests that there is no 
difference between self-paced and 
automatically paced programs as de- 
terminers of response time. However, 
the problem of pacing for individual 
items is still a recent one and needs 
further research. Another aspect of 
response time—the distribution of 
practice—has been studied exten- 
sively since Ebbinghaus’ investiga- 
tion in 1885. For example, Holland 
(unpublished) states that in an ex- 
periment at Harvard ‘‘a few students 
completed all the disks in a small 
number of long sessions while others 
worked in many short sessions. . . . 
Apparently the way practice was dis- 
tributed made little difference” (p. 4). 
Nevertheless, the distribution of prac- 
tice, like the problem of pacing, is yet 
a subject of controversy, with most 
investigations favoring some form of 
distributed practice (Hovland, 1951). 

The above section outlines briefly 
some of the major problems associ- 
ated with the variables affecting re- 
sponse mode. Although some of the 
variables have already been investi- 
gated, these and others, together 
with their interactions, need further 
research. 


Knowledge of Results 


There are many peripheral prob- 
lems related to teaching machines, 
one of which is the effect of immediate 
knowledge of results on learning: 
Angell (1949), using a multiple- 
choice punchboard technique, found 
that “learning is significantly en- 
hanced by immediate knowledge of 
results.” Briggs (1949), also using 
the Punchboard, confirmed these re- 
sults. Bryan and Rigney (1956) 
noted superior performance when 
subjects were given knowledge of 
results, specifically, an explanation 
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if the answer was incorrect. This last 
study was later expanded by Bryan, 
Rigney, and Van Horn (1957), who 
investigated differences between 
three kinds of explanation given for 
incorrect response. None of the three 
types of explanation proved to be 
superior in teaching the subjects. 
Because of their controvertible re- 
sults, the above studies demonstrate 
that, although immediate knowledge 
of results appears to be effective in 
the learning process, this problem 
contains many facets which need 
more empirical data. 


Motivation 


One of the many reasons given for 
the effectiveness of teaching machines 
is that the student’s motivation is 
increased. Psychologists and edu- 
cators have realized for some time 
that the motivation variable ranks 
very high among those variables 
pertinent to learning. In 1958 and 
1959, Holland surveyed the use of the 
teaching machine in classes at Har- 
vard. He found that most students 
felt that they would have gotten less 
out of the course if the machines had 
not been used, that most students 
preferred to have machines used for 
part of the course, and finally that 
most students felt that the teaching 
machine was used by the instructor 

to teach me as much as possible with 
a given expenditure of my time and 
effort.” During a field tryout of the 
Subject-Matter Trainer in the Semi- 
automatic Ground Environment Sys- 
tem, Mayer and Westfield (1958) 
observed that ‘motivation to work 
With the trainer is high.” The super- 
visory as well as the operational per- 
Sonnel encouraged the use of this 
training technique. 

Mager (1959) suggests that moti- 
vation and interest are a function of 
the percentage of correct responses. 
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He observed that in two young sub- 
jects negative feelings for learning 
mathematics in the usual classroom 
situation did not transfer to learning 
mathematics by means of a teaching 
machine. The cause of this phenome- 
non is perhaps best explained by the 
subjects’ statement that, because 
they were able to understand the 
programed material, it did not seem 
to be mathematics at all. This inter- 
esting relationship between compre- 
hension and motivation needs further 
investigation. 


Equipment 

There are many inexpensive models 
of teaching machines which will soon 
hit the consumer market. For much 
of this equipment, there is very little 
experimental evidence which sup- 
ports the various designs. As previ- 
ously pointed out, Holland has col- 
lected data which support the effi- 
ciency of the drop-out feature in a 
teaching machine; yet commercial 
models presently available do not 
incorporate this feature, presumably 
because of its high cost. Generally, 
it seems that production is now and 
will continue to be out of phase with 
much of the research which has pro- 
vided necessary teaching machine 
specifications. Moreover, because of 
their expense, it is likely that some 
very important features will be 
omitted in manufacture. 

The methods of displaying pro- 
gramed material, another unexplored 
problem area, must be investigated 
so as to provide the design engineer 
with requirements based on empirical 
findings. The display problem is less 
acute, perhaps, with material for the 
elementary school than it is with pro- 
d to teach maintenance 


grams designe ; 
procedures and aspects of the bio- 


logical sciences. 
The use of computer controlled 
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teaching machines has been recom- 
mended by many authors (Coulson 
& Silberman, 1959; Skinner, 1958). 
Utilizing a central computer, with 
many programs capable of adapting 
to individual needs and of providing 
stimulus materials to 50 or more 
students simultaneously, is a feasible 
notion for large-scale training pro- 
grams. With a computer, the display 
problem again becomes a major issue. 
Training in pattern recognition, in- 
formation handling, and display in- 
terpretation are but a few appropriate 
areas which should be studied. The 
alternate modes of presentation be- 
come more extensive as computer 
capacity increases. In the case of 
certain kinds of subject matter, a 
computer generated, pictorial dis- 
play of information may be a more 
effective presentation than other dis- 
play techniques. Future research 
must solve these problems in equip- 
ment design. 


Teaching Machines and Other 
Techniques 


The use of automated teaching 
devices may be optimized, perhaps, 
if there is a proper balance between 
this technique and other compatible 
teaching methods. What percentage 
of a course should be machine taught? 
What subject matter is best suited to 
automated devices? If classroom 
courses were as carefully and thought- 
fully programed as some of the pro- 
grams currently being prepared for 
teaching machines, might some of the 
advantages of machines diminish? 
Perhaps some of the apparent ad- 
vantages of teaching machines are no 
more than methods of illustrating 
correctable classroom techniques! It 
might well be that the instructor’s 
enthusiasm and inspiration, a factor 
supposedly dominant in higher edu- 
cation, is vital in mastering a par- 
ticular subject-matter area. Will 
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creativity in certain students be 
harmed by extensive education via 
the machine? Again, consideration of 
the use of a teaching machine, the 
subject matter, the program, the 
level of education, and the techniques 
used in combination with the teach- 
ing machine provide a fertile field for 
experimentation. As of now, ques- 
tions in this area remain unanswered. 
Silverman (1960a) has presented an 
excellent, detailed discussion of prob- 
lems inherent in this new technology 
of automated teaching and the cur- 
rent trends in the field. 


PROBLEMS OF APPLICATION 


The most obvious problems in the 
attempt to use automated teaching 
techniques have been outlined in the 
previous section. There is still much 
of the unknown associated with 
techniques, machines, programing, 
etc. to be eliminated before a direct 
solution to a particular training prob- 
lem can be specified. Many alterna- 
tives exist, the best of which has not 
yet been determined. In addition to 
these voids, there is a serious lack of 
definition in the objectives of many 
training programs. 

What is the objective of a particu- 
lar automated course or program! 
From a pragmatic point of view, 
what are the criteria by which a 
specific educational program can be 
evaluated? For example, the objec- 
tives might range from the teaching 
of rote tasks to the presentation 0 
more abstract material. Needless to 
say, the techniques for both teaching 
and evaluating learning could be 
substantially different in each case. 
The purpose of teaching, the objec- 
tive of an educational program, must 
be initially defined. Only then will 
the concepts learning and teaching be 
meaningful in a particular context. 

After definition, the next step is t0 
determine what subject matter W! 
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provide the student with the neces- 
sary information. It is at this point 
that the major pitfall in education is 
likely to appear. Even though many 
training programs do not have a de- 
fined objective, their course content 
is nonetheless prescribed, and the 
text and/or materials used in previ- 
ous, nonautomated courses become 
the prime source of material for an 
automated teaching program. To 
program an automated teaching ma- 
chine with presently available ma- 
terials might well result only in a 
more efficient method of teaching 
the wrong material! 

The third step requires decisions in 
the selection of appropriate teaching 
techniques. Answers to questions 
involving programing, choice of 
teaching machine, learning proce- 
dures, pacing, and response modes 
are still not known. 

The fourth and last step requires 
an evaluation of the selected auto- 
mated teaching method in terms of 
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the originally established objectives. 
Conventional methods of instruction 
should be compared with the innova- 
tive methods by means of a specific 
set of criteria, e.g., in terms of train- 
ing time, job performance, retention 
of learned information, etc. 

The questions confronting the re- 
searcher in teaching machine technol- 
ogy are one example of the broader 
questions of man-machine interrela- 
tion. Data pertinent to the principles 
of human engineering, the optimum 
man-machine interaction, the degree 
to which the machine can perform 
functions formerly allocated to man, 
and the appropriate allocation of 
functions between man and machine 
will be provided by a research pro- 
gram investigating teaching ma- 
chines. Inadequate attention to any 
of the above-mentioned steps will 
result in failure to provide the needed 
answers in a field which may increase 
training effectiveness and reduce 
training costs. 
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How does a listener determine the 
direction from which a sound comes? 
In the last century, physicists and 
physiologists usually gave a psycho- 
logical explanation; they held that 
the listener makes a judgment by 
comparing the differing stimulation 
at the two ears. In the present cen- 
tury psychologists and physiologists 
have been seeking a physiological 
explanation; they are attempting to 
find where and how nerve impulses 
originating at the two ears interact in 
the brain. In this paper we will trace 
the development of hypotheses con- 
cerning the mechanisms of auditory 
localization. (“Localization” will 
signify here the perception of the 
direction of a sound source; perception 
of distance will not be considered.) 

First, however, let us note briefly 

the importance of auditory localiza- 
tion in animal and human life. Some 
animals owe their livelihood to their 
ability to localize. Thus, certain 
types of bat catch their insect prey in 
the dark by echolocation, and certain 
moths, in turn, attempt to detect and 
avoid these bats by auditory clues 
(Griffin, 1958). Human beings also 
have considerable ability to localize 
sounds. Localization provides the 
basis for the detection of obstacles by 
blind people, an ability which long 
remained a mystery under the am- 
biguous designation of “facial vision 
of the blind.” In the 1940s Cornell 
psychologists proved conclusively 
that this performance depends on lo- 
calization of echoes reflected from the 
obstacles (Cotzin & Dallenbach, 1950; 
Supa, Cotzin, & Dallenbach, 1944). 
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Seeing persons, too, benefit greatly 
from their ability to localize sounds. 
Localization makes it easier to listen 
to one signal or message in the pres- 
ence of competing signals—a task 
which communication engineers refer 
to as “the cocktail party problem” 
(Cherry, 1957). Because sounds can 
be discriminated more readily when 
they are heard as coming from differ- 
ent directions, binaural hearing aids 
and stereophonic recordings of music 
are becoming steadily more popular. 


INITIAL WORK ON THE MECHANISMS 
oF AUDITORY LOCALIZATION 


The first person to investigate the 
nervous system in regard to localiza- 
tion seems to have been Louis Jurine, 
a naturalist, anatomist, and physi- 
cian of Geneva. Lazaro Spallanzani 
of Pavia had demonstrated in 1793 
that blinded bats could fly and avoid 
obstacles just as well as normal bats, 
but he could not imagine what sense 
then substituted for vision (1798). 
Jurine took up the question and 
decided that the solution must lie “at 
the tip of a scalpel.” Noting the large 
size of the external ear of the bat, he 
went on to find that “a considerable 
neural apparatus” was devoted to 
hearing. Unfortunately, the pub- 
lished extract of his account (1798) 
does not give any fuller indication o 
Jurine’s neuroanatomical findings. 
(The central connections of the audi- 
tory nerve seem not to have been 
discovered until almost a century 
later. In the bat the cochlear nucleus 
bulges out from the medulla right t° 
the tip of the cochlea, and it is poss 
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ble that Jurine took this for a large 
auditory nerve.) Following this lead, 
Jurine devised ingenious behavioral 
tests which provided clear evidence 
that the blinded bat guides itself 
by auditory clues. 


Venturi and the Intensity Hypothesis 


The first experiments on auditory 
localization by human observers seem 
to be those reported in two similar 
articles (1800a, 1800b) by Giovanni 
Battista Venturi, Professor of Physics 
at Modena and Pavia.! Venturi’s 
experiments were similar to those 
that Lord Rayleigh performed inde- 
pendently 75 years later (1877). 
Venturi concluded that “The in- 
equality of the two simultaneous 
sensations of the two ears informs us 
of the true direction of the sounds” 
(1800a, p. 386). Part of his evidence 
was that sounds coming from directly 
in front of the observer could not be 
distinguished from sounds coming 
directly from the rear, if the observer 
kept his head still. Venturi also con- 


1 This work of Venturi seems to have set 
the style for most of the research on auditory 
localization during the nineteenth century. 
Yet, while some of his findings soon became 
common knowledge, it was forgotten who had 
discovered them. Thus Klemm, in his de- 
tailed history of auditory localization (1914), 
mentioned Venturi in a single sentence and 
ay in regard to effects of head movements. 
ae (1901) and Boring (1942) did not men- 
ine Venturi in their historical discussions of 
localization; both ascribed the first experiment 
on localization to E. H. Weber in 1848. 

Venturi's first publication on this subject is 
even earlier than those indicated above. A 
Paper of 1796 in French gives almost the same 
material as that of the German articles of 1800 
(see Venturi, 1796). 

a 1801 Venturi published a report in 
ee ‘an on this research, appending it to the 
ec edition of his book on physical re- 
t arch on color (see Venturi, 1801). The con- 
ents of this version are similar to those of the 
Pe articles. Thus it appears that Venturi 
icf € repeated attempts to secure a wide pub- 
€ lor this research. 
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cluded that a person with one deaf ear 
must turn his head to localize and will 
usually make errors in localizing 
sounds that are very brief. (As we 
shall see, certain recent experimenters 
have ignored the role of head move- 
ments in localization.) 

Venturi noted that philosophers 

had attempted to explain the single- 
ness of vision by the convergence of 
the two optic nerves, but he held that 
this was not the case for hearing: 
Since we distinguish the two simultaneous 
sensations of the two ears, and since their dif- 
ferent intensities furnish us knowledge of the 
true direction of the sound, therefore one 
must conclude that the two sound impressions 
do not mix together inside the skull (1800a, 
p. 388). 
Venturi furthermore concluded that 
the visual impressions of the two eyes 
do not mix, citing the phenomenon 
that was later to be called “retinal 
rivalry.” 

Venturi’s intensity theory re- 
mained the dominant explanation for 
localization until early in the twenti- 
eth century. It was propounded, for 
example, by Magendie (1831) and 
Johannes Miiller (1840). Magendie 
offered it as something evident and 
did not credit its discovery to anyone. 
Müller claimed that perception of 
direction of sound “is an act of judg- 
ment which founds it on experience 
previously acquired. ve The only 
true guide for this inference 1s the 
more intense action of the sound upon 
one than upon the other ear” (p. 479). 
He further noted that when a sound 
comes from directly ahead or behind, 
it falls equally upon the two ears and 
is then impossible to localize; this 
demonstration he ascribed to ‘Ven- 


turini” (sic). 
PROGRESS IN THE SeconD HALF 
TEENTH CENTURY 


OF THE NINE 
The first person to abandon the 


judgmental interpretation of localiza- 
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tion seems to have been S. Scott 
Alison, an English physician. He had 
invented the ‘differential stetho- 
phone” which consisted simply of two 
stethoscopes, one for each ear. After 
using this instrument, Alison re- 
ported that a sound is restricted to 
the ear that receives it in greater 
intensity and is suppressed in the 
other ear. This, he remarked, 


holds apparently in virtue of a law seemingly 
established for the purpose of enabling man 
and the lower animals to determine the direc- 
tion of the same sound, with more accuracy 
than could be done had a judgment to be 
formed between the intensity of two similar 
sensations in the two ears respectively (1858, 
pp. 388-389). 


Alison was considerably ahead of his 
time in this formulation but his work 
seems to have had little influence. 
Sylvanus P. Thompson wondered 
about the basis of binaural beats, 
heard when he connected two slightly 
mistuned forks, one to each ear 
(1877). (Lord Rayleigh was to ob- 
serve later—1907—as Dove—1857 
had earlier, that the sound also 
changed location while beating.) 
Thompson rejected the hypothesis 
that bone conduction could account 
for binaural beats. Noting that the 
auditory nerves do not decussate as 
the optic nerves do, he concluded 
“that any means of comparison which 
may exist in the nerve systems of the 
ears exists deep-seated in the actual 
structure of the brain” (1877, p. 276). 
In his next paper he again noted that 
it is problematical where the sensa- 
tions from the two ears “blend,” and 
he remarked, “This point deserves 
the attention of anatomists and 
physiologists” (1878, p. 389). 


Tracing the Afferent Auditory Path- 
ways 

Physiologists and anatomists had, 
in fact, already turned their attention 
to the localization of sensory proc- 
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esses in the brain. The study of 
cerebral localization of function had 
recently been given new impetus by 
the introduction of a new method— 
precise electrical stimulation, intro- 
duced by Fritsch and Hitzig in 1870. 
The revolutionary results obtained 
by this method led also to renewed 
interest in experiments involving 
precise ablation, as a check on the 
electrical experiments. David Fer- 
rier, Professor of Neuropathology in 
London, began extensive mapping of 
the brain of several species in 1873, 
using electrical and surgical tech- 
niques (1890). In the superior tem- 
poral convolution of monkeys and its 
homologues in other species, electrical 
stimulation produced the same reac- 
tion as if a shrill sound had been made 
in the contralateral ear. The animal 
pricked up or retracted the ear and 
often moved its head or eyes to that 
side. (Much later the same experi- 
ment was to be performed in con- 
scious surgical patients—Penfield & 
Rasmussen, 1950. In most cases, the 
human subjects reported hearing 
sounds on the side contralateral to 
the stimulated hemisphere; some 
sounds were heard “bilaterally”; no 
sounds were heard ipsilaterally.) Fer- 
rier also claimed that ablation of the 
auditory cortex of both hemispheres 
made monkeys inattentive to soune. 
Heschl in 1878 succeeded in tracing 
the auditory tracts to the superior 
temporal convolution, and the audi- 
tory area is often given his name. 
Luigi Luciani of Florence used the 
ablation technique extensively 1 
cortical mapping (Luciani, 1884; 
Luciani & Seppilli, 1886; Luciani & 
Tamburini, 1879). One behavioral 
test devised by Luciani has recently 
been reintroduced by Riss (1959). 1” 
this test, bits of food were thrown t0 
the floor near the blindfolded animal, 
and the accuracy of its reactions t0 
the sounds was noted. Luciani co 
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firmed that the auditory area of the 
cortex is located posteriorly in the 
temporal lobe. He found that 

each ear [has] connections with both auditory 
spheres, but chiefly with that of the opposite 
side. In fact, every unilateral extirpation of 
sufficient extent in the province of the audi- 
tory sphere causes a bilateral disorder of hear- 
ing, more marked on the opposite side... 
(1884, p. 155). 


Thus, after an ablation in the right 
hemisphere, these results were re- 
ported by Luciani and Seppilli: 
Hearing is affected on both sides, but more at 
the left than at the right ear. The animal 
shows that it hears the sound of pieces of food 
falling to the left, but it mistakes the direction 
and turns to the other side. At the right ear 
this does not happen (1886, p. 79). 


The effects of unilateral lesions gener- 
ally disappeared within a few weeks; 
they persisted longer the larger the 
lesion. None of the lesions covered 
the whole auditory area as it is now 
defined, so it cannot be told from 
these studies whether a complete 
unilateral lesion would have led to 
some permanent impairment of local- 
ization. Bilateral lesions of the audi- 
tory areas were found to produce 
permanent perceptual impairment, as 
this example indicates: 

When called suddenly, the dog reveals through 
its movement that it is not deaf; but it does 
not follow and does not turn its head toward 
the sound, but, in fact, often even turns to the 
other side; in short, it seems not to understand 
what it hears and not to perceive the direction 
of sounds (psychic deafness) (1886, p- 119). 


Luciani concluded that in the audi- 
ae system, just as in the visual sys- 
em 


We must distinguish a crossed and a direct 
fasciculus; the former consisted of a much 
larger number of fibers than the latter. 
Neither of these fasciculi possesses any uni- 
orm relation with distinct segments of their 
respective cortical spheres, but their fibers ir- 
radiate themselves throughout the area of 
these centres (1884, p. 155). 


Histological degeneration studies of 
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Baginsky, Flechsig, and von Mona- 
kow soon revealed rather completely 
the course of the auditory pathway, 
crossing in part in the brainstem and 
proceeding by way of the inferior 
colliculus and medial geniculate body 
to the cortex of the temporal lobe 
(Ferrier, 1890). 

After these early achievements, 
progress not only lagged but some of 
the findings were even forgotten by 
workers in the field. Thus, for ex- 
ample, it was taken as surprising in 
1928 when removal of one cerebral 
hemisphere of a patient did not de- 
stroy hearing in the opposite ear 
(Bunch, 1928). 


Tue EARLY TWENTIETH CENTURY: 
THE PHASE HYPOTHESIS 


Although the role of dichotic phase 
differences in auditory localization 
had been shown by Dove (1857), 
Thompson (1877), and others in the 
nineteenth century, the phase hy- 
pothesis was firmly established only 
in the twentieth century- Lord 
Rayleigh had considered this hy- 
pothesis previously, but his advocacy 
of it in 1907 convinced others. Ray- 
leigh showed that while dichotic 
intensity differences permit localiza- 
tion of high frequency sounds, it 1s 
dichotic phase differences that per- 
mit localization of low frequency 
sounds. This belated recognition of 
the dual basis of localization might 
have been expected to embarrass 
supporters of the judgmental posi- 
tion. In fact, Rayleigh did remark, 
“Perhaps it is not to be expected that 
we should recognize intuitively the 
very different basis upon which our 
judgment rests in the two cases 
(1907, p. 203). Nevertheless he never 
abandoned the judgmental interpre- 
tation of localization. ; 

Bowlker (1908), who experimented 
on the role of phase differences, 
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speculated briefly about neural inter- 
action: 

we may suppose that the transmission of 
sound impulses through some specialized part 
of the auditory apparatus or brain takes a def- 
inite time from each ear, and that the point 
where the impulses meet is the focus that 


gives rise to the sensation of a sound image 
(p. 327). 


From the vantage point of the pres- 
ent, this seems to anticipate Jeffress’ 
hypothesis of 1948, but Bowlker’s 
suggestion is so terse that we cannot 
be sure. 
The success of the Stenger test 
for unilateral auditory malingering 
(1907) could have been taken as evi- 
dence against the judgmental ap- 
proach to localization, but it does not 
seem to have been. The test, recently 
termed “the most reliable and effec- 
tive of all malingering tests” (Watson 
& Tolan, 1949), works in the follow- 
ing way: A subject who simulates 
deafness of one ear will report hear- 
ing a tone that is delivered only to his 
other ear. When the tone is next de- 
livered to both ears, and more in- 
tensely to the ear whose deafness is 
feigned, the subject will hear it only 
at this supposedly deaf ear. The 
malingerer will therefore give him- 
self away by reporting that he does 
not hear the sound, in spite of the 
fact that it is present in audible in- 
tensity at the admittedly good ear. 
Thus, it was clear to Stenger (and it 
should have been to all users of his 
test) that the listener hears only a 
single sound and does not compare 
separate sensations arising from the 
two ears. This advance in clinical 
testing had no apparent influence on 
the development of thinking about 
auditory localization. 


THE TIME HYPOTHESIS 


The hypothesis that dichotic time 
provides a basis for localization seems 
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first to have been proposed seriously 
by Mallock (1908) and first to have 
been demonstrated experimentally by 
Aggazzotti (1911). It was brought to 
wide attention in several publications 
at the end of the first World War 
(Klemm, 1918, 1920; Piéron, 1922; 
von Hornbostel & Wertheimer, 1920). 
Dichotic stimuli separated by as little 
as 30 microseconds were shown to be 
perceived toward the side of the prior 
component. The time hypothesis is 
incompatible with the judgmental ap- 
proach, for the dichotic time intervals 
which provide for localization lie 
under the threshold of fusion of suc- 
cessive auditory stimuli. That is, 
dichotic stimuli with a time interval 
less than 2 milliseconds give rise to 
perception of a single localized audi- 
tory event; there are not two per- 
ceptual events that can be compared 
in order to judge localization. Per- 
haps the first to recognize the in- 
compatibility of the time hypothesis 
and the judgmental approach were 
Kreidl and Gatscher (1923). Their 
conclusion was to reject the time 
hypothesis! Since they showed that 
stimuli must be separated by about 
20 milliseconds to be judged as suc- 
cessive, they denied that smaller 
intervals could have any effect 1m 
perception. von Hornbostel (1926) 
showed the fallacy of this argument 
Furthermore, any observer who at 
tempted to test the time hypothesis 
could verify it. The success of the 
time hypothesis thus helped to ovet- 
come the judgmental approach an 

to clear the way for work on the 
physiological mechanisms of localiza- 
tion. 


Hypothesized Central Mechanisms 


After the role of time difference 
was demonstrated, several furthet 
hypotheses about the mechanisms o 
localization were soon proposed. V0" 
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Hornbostel! (1926) suggested that in- 
tensity differences were converted 
into time differences in the auditory 
system, a stronger stimulus evoking 
neural responses with less latency 
than a weaker stimulus. Kemp and 
Robinson (1937b) were able to dem- 
onstrate that the latency of auditory 
impulses does decrease with intensity, 
but only within 40 db. of threshold. 
Stevens and Davis in their book, 
Hearing, Its Psychology and Physi- 
ology (1938), mentioned the work of 
Kemp and Robinson and concluded 
that the effect of intensity differences 
cannot result solely from changes in 
latency. Their reason was that 
changes in the binaural intensity 
ratio can shift the location of intense 
tones. In fact, the change of ratio had 
been found to be smallest when the 
tone was about 80 db. above thres- 
hold (Upton, 1936); at this level 
there is no longer a change in latency, 
according to Kemp and Robinson's 
results. (This point, we may note, is 
all that Stevens and Davis had to say 
about the physiological correlates of 
auditory localization.) Later re- 
search (e.g., Pestalozza & Davis, 
1956) has shown that latency con- 
tinues to decrease with intensity up 
to at least 70 db. above threshold; 
this gives new support to von Horn- 
bostel’s hypothesis. 

Boring (1926) suggested that the 
locus of cortical excitation might be 
the physiological correlate of localiza- 
tion. He hypothesized that the ears 
project, in each cerebral hemisphere, 
to cortical areas that are not coin- 
cident but which overlap. If one ear 
was stimulated either earlier or more 
strongly than the other, then the 
cortical excitation would be located 
mainly in the projection area of that 
ear. 

Trimble (1928) proposed a vague 
central hypothesis in which the inter- 
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aural differences are transmitted to 
the cortex where localization occurs. 


The directional localization of a sound source, 
under ordinary conditions of hearing, depends 
upon the configurational nature of the cortical 
effects that correspond to the physical “differ- 
ence-pattern” at the ears (p. 523). 


von Békésy (1930) proposed a 
rather detailed schema. He pictured 
a region of cells where the auditory 
tracts from the two ears join. Audi- 
tory localization would depend upon 
the proportions of the region that 
each side excited. Both greater in- 
tensity and prior arrival would favor 
the ear so stimulated. 

Woodworth described a possible 

mechanism in the discussion of audi- 
tory localization in his text, Experi- 
mental Psychology (1938): 
It must be a unitary mechanism capable of 
turning the head in either direction and re- 
sponsive to nerve currents from both ears. 
When the currents arrive from both ears, but 
more from one ear, that side has the ad- 
vantage. When the current from one ear 
arrives at the central mechanism ahead of the 
other and gets in its work first, it has the ad- 
vantage (p. 533). 


Jeffress (1948) suggested a neural 
“mechanism for the representation of 
a time difference as place” in the 
auditory system. He pictured a 
a center where tracts from both ears 
make common synaptic connections. 
Within this center there are places 
where the conduction time is slightly 
longer from one ear than from the 
other. If the two ears are stimulated 
simultaneously, the impulses meet 
and summate at the locus where the 
conduction times from both sides are 
equal. If one ear is stimulated before 
the other, then the impulses meet ata 
different locus—@ locus where the 
difference in conduction times com- 
pensates for the dichotic time differ- 
ence. Intensity difference 15 trans- 
lated into time difference and this is 
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then handled in the same way. 
Jeffress ventured that this mecha- 
nism might be located in the medial 
geniculate, relying on the electro- 
physiological evidence of Kemp and 
Robinson (1937a) that no binaural 
interaction could be found at the 
lateral lemniscus. Later Jeffress dis- 
avowed this location and suggested 
that if his hypothesized mechanism 
exists, it exists in the a 

nucleus of the superior olive (1958). 


LOCALIZATION INVESTIGATED 
BY PHYSIOLOGICAL TECHNIQUES 


While these hypothetical mecha- 
nisms were being proposed, further 
physiological findings were being ob- 
tained by both ablation and electro- 
physiological techniques. 
Information from Ablation Studies 

Pavlov (1927) reported a finding 
of Bikov that a dog could not learn to 
discriminate between right and left 
positions of a sound source after the 
corpus callosum had been transected 
(p. 150). Girden (1940) found, on the 
contrary, that dogs retained a learned 
right-left discrimination of a tone or 
bell after transection of the corpus 
callosum. Girden’s animals were 
trained to respond by flexing a leg 
when the sound came from one side 
and not to respond when it came from 
the other. Each sound lasted for 2 
seconds. They also retained the dis- 
crimination after hemidecortication, 
losing it only after complete bilateral 
ablation of the auditory cortex. Even 
after bilateral ablation, the dog could 
still orient itself to sound, but it could 
not be retrained to show the condi- 
tional discrimination. 

ten Cate (1934) reported that 

decorticate cats could orient to a 
sound stimulus. He used various 
sounds, all of them lasting for 15 to 20 
seconds. The physiologists Bard and 
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Rioch (1937) also reported that de 
corticate cats could localize sounds 
accurately, but they did not offer any 
quantitative observations. 
Measurement of impaired ability” 
to localize by cats with bilateral abla- 
tions of the auditory cortex was fur- 
nished by Neff and his collaborators 
(see Neff & Diamond, 1958, for a his 
tory of several stages of this re 
search). Cats were trained to go to 
the one of two boxes behind which a 
buzzer sounded. Intact animals” 
could do this when the boxes were 
only 5 degrees apart, the angle being 
measured from the point at which the 
cats were released into the test area. 
Animals with complete bilateral de 
struction of the auditory cortices 
could discriminate only when the 
boxes were 40 degrees apart. Three 
different hypotheses were advanced, 
any of which might account for the 
observed results: (a) “An intact 
auditory cortex is necessary in order 
that the relationship between audi- 
tory signal and food reward may be 
learned” (Neff, Fisher, Diamond, & 
Yela, 1956, p. 510). Further experi- 
mentation refuted this hypothesis, 
since cats with bilateral ablation of 
auditory areas could learn to open @ 
single door when a buzzer sounded. 
(b) “An intact auditory cortex is 
essential for maintenance of attention 
to an auditory signal.” (c) “An inta 
auditory cortex is necessary for accu; 
rate localization of sound in space 
(1956, p. 511). Neff and Diamond 
(1958) also reported preliminary re- 
sults indicating that ability to localize 
in their tests 
is not affected by section of the corpus callo- 
sum, is affected very little if at all by section 
of the commissure of the inferior colliculus, 


but is severely affected by section of 
trapezoid body (p. 108). 


Riss (1959) noted that auditory 
signals of relatively long duration ha 
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been employed in the studies of ten 
Cate and Neff. He raised the ques- 
tion whether binaural localization 


was actually tested in their work, 
since it is well known that monaural 
localization is possible if head move- 
ments can be performed while the 
sound continues. (It will be remem- 
bered that Venturi had made this 
point in 1800.) Riss pointed out that 


animals of Bard and Rioch oriented 
to the stimulus slowly, using notice- 
able head and ear movements. Riss 
therefore employed both very brief 
sounds and sounds lasting as long as 
30 seconds in experiments with cats. 
For brief sounds, bits of food were 
thrown down beside the cat; this is 
the method that Luciani had used 75 
years previously, although Riss evi- 
dently did not know of his work. The 
results replicated and extended those 
of Luciani. Cats with control lesions 
were successful with both types of 
signal. Cats with bilateral ablation of 
the auditory areas “showed evidence 
of being unable to orient to brief 
sounds but were partially successful 
in seeking out the region of the sound 
if the sound was prolonged” (p. 383). 
Tests revealed that these ani 
could maintain attention to sound. 
Riss therefore concluded “that the 
auditory cortex is necessary for lo- 
calizing the instantaneous position of 
a sound” (p. 383). 


Information from Electrophysiological 
Studies 

Wherever in the brain the tracts 
from the two ears converge function- 
ally, it should be possible to find 
interaction between the electrophysi- 
ological responses. Kemp and Robin- 
son (1937a) recorded from the brain 
stem of the anaesthetized cat while 
Presenting monaural or binaural 
tones or clicks. They interpreted 
their results as showing no bina 


interaction at the level of the lateral 
lemniscus, arguing “against the com- 


Dow on the cat (1939). Bremer and 
Dow reported that the response to 
stimulation of either car was greater 
at the contralateral cortex. In con 
trast to this, Woolsey and 
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evidence of binaural interaction at 
the cortex. Since his dichotic stimuli 
were usually separated by intervals of 
several milliseconds, it remained to 
determine whether dichotic intervals 
of a fraction of a millisecond could be 
preserved in afferent transmission all 
the way up to the cortex and used 
there for binaural interaction. Al- 
though transmission from the cochlea 
to the cortex requires about 10 milli- 
seconds, dichotic time intervals of 
one-tenth of a millisecond were found 
to affect response amplitudes signifi- 
cantly (Rosenzweig, 1954; Rosen- 
zweig & Rosenblith, 1950). The rela- 
tion between amplitudes of simul- 
taneous responses at the auditory 
areas of the two hemispheres was 
shown to correlate with auditory 
localization. “At either hemisphere 
the amplitude of the summated re- 
sponse is larger when the contra- 
lateral ear receives the prior stimu- 
lus” (Rosenzweig & Rosenblith, 1950, 
p. 879). “The cortical events were 
found to parallel in several respects 
the perceptual phenomena which 
occur under the same stimulus condi- 
tions” (Rosenzweig, 1954, p. 275). 
Bremer (1952) arrived independently 
at the hypothesis that the relation be- 
tween the amplitudes of responses at 
the two hemispheres is the cerebral 
index to auditory localization. 

Tests were then made for binaural 
interaction at lower levels of the audi- 
tory system. Ades and Brookhart 
had suggested “that the inferior col- 
liculus with its strong commissural 
connections and connections to affer- 
ent mechanisms may be the principal 
device responsible for localization” 
(1950, p. 203). Interaction was found 
at the inferior colliculus (Coleman, 
1953; Rosenzweig & Wyers, 1955), 
but the importance attributed to the 

commissural connections by Ades 
and Brookhart was thrown into 
doubt by the following observations: 
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Transecting the commissure ¢ 
colliculi does not affect interac 
(Rosenzweig & Wyers, 1955); mo 
over, this transection does not ii 
pair auditory localization (Neff & 
Diamond, 1958). At the colliculi, as 
at the cortex, recordings made 
macroelectrodes show that 
tion of the contralateral ear ev 
responses of greater amplitude than 
does stimulation of the ipsilateral ear 
(Rosenzweig & Wyers, 1955). Using 
microelectrodes, Erulkar (1957) found 
that of 89 single units tested with 
click stimuli, 23 responded only to 


units that responded to either ear, 
latency of response was neverthele 
shorter for contralateral than f 
ipsilateral stimulation. Furthermore, 
latency showed rather reg 
changes as the position of the sti 


Proceeding further down the audi- 
tory system, evidence of binaural 
interaction was also found at 
lateral lemniscus (Rosenzweig & 
Amon, 1955; Rosenzweig & Sutton, 


clusion of Kemp and Robinson 
(1937a) that tracts from the two 
ears do not converge before the col 
liculi. Kemp and Robinson had 
found no signs of interaction whem 
they used stimuli dichotic in tim e, 
but they gave no details, not evel 
the time intervals employed. Ro 
zweig and Sutton, on the contrar) 
presented measures of the reductio 
in amplitude of the response to the 
second stimulus as a function of the 
dichotic interval. ! 
The lowest level at which binauré 
interaction occurs may be the supt 
rior olivary nuclei. Stotler (1953) 
ported these findings concerning tA 
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anatomy of the olivary nuclei: 


The cells of the medial superior olivary nu- 
cleus receive afferent terminals in the form of 
boutons from both cochlear nuclei. The 
afferent fibers from the homolateral cochlear 
nucleus terminate on the lateral pole of the 
cell, while those from the contralateral coch- 
lear nucleus end in relation to the medial pole. 
The axons of the medial nucleus enter the 
homolateral lateral lemiscus (p. 420). 


Thus the cells of this nucleus seem 
ideally situated to integrate informa- 
tion from the two ears. Subsequent 
electrical recording has in fact shown 
that some of the superior olivary 
nuclei show differences in responses 
depending upon time differences in 
stimulation of the two ears (Galam- 
bos, Schwartzkopff, & Rupert, 1959). 
Units in n. accessorius proved to be exquisitely 
sensitive to whether the right ear or the left 
was stimulated first by paired clicks; the 
unique physiological and anatomic character- 
istics of these cells seem relevant to the bin- 
aural sound localization problem (p. 527). 


Thus the primary neural interactions 
basic to localization may occur low in 


the afferent pathways. 
It had been suggested in a pre- 
liminary report that interaction 


might occur at as low a level as the 
cochlea itself, impulses being trans- 
mitted from one cochlea to the other 
(Galambos, Rosenblith, & Rosen- 
zweig, 1950). The transmission ap- 
peared to require about 1 millisecond, 
and some evidence of interaction was 
obtained when a click at one ear pre- 
ceded that at the other ear by 1.25 
milliseconds. In a later study no evi- 
dence of interaural interaction was 
obtained, using a dichotic interval of 
3.6 milliseconds (Rosenblith & 
Rosenzweig, 1951). It now appears 
that the latter interval may have 
been too great, since interaction at 
the lateral lemniscus can be o! 

clearly only if the dichotic interval is 
less than about 3 milliseconds 
(Rosenzweig & Sutton, 1958). More 
telling evidence against the occur- 


where significant interaction was 


of the order of 1 millisecond, while 
localization is obtained with dichotic 
intervals of one-tenth of a 
or less. Thus the lowest level of the 
auditory system at which binaural 
interaction has been is 
that of the olivary nuclei. The abla- 
tion studies nevertheless suggest that 
the auditory cortex must be involved 
if neural interaction in the brain stem 
is to eventuate in behavioral dis- 
crimination of location of sound 
sources. 
Hemisphere? 
Walsh (1957) tested 22 patients 
i defects to find 


earphones. With most subjects only a 


order of 300 to 500 microseco 


nly reported for normal sub- 
oom The hemispherectomized pa- 
tient could localize with intervals of 
410 and 190 microseconds but not 
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with an interval of 125 milliseconds. 
Walsh concluded, “The sensitivity to 
binaural time differences is retained 
after the loss of the auditory cortex 
on one side” (p. 248). Since thres- 
holds were not determined, it cannot 
be concluded that there was no im- 
pairment of localization with loss of 
the auditory cortex of one hemi- 
sphere. 

Accurate thresholds for dichotic 

stimulation were determined for 
brain injured subjects in a study re- 
ported briefly by Teuber and Dia- 
mond (1956). Twenty patients with 
penetrating brain injuries, 14 of 
them unilateral, were compared with 
10 control subjects who had injuries 
of peripheral nerves. The brain in- 
jured subjects, compared to the con- 
trols, required a significantly larger 
dichotic interval to shift a click from 
the median plane; the thresholds 
were 225 and 105 microseconds, re- 
spectively. Similarly, the difference 
in intensities at the two ears neces- 
sary to shift a click from the center 
location was significantly larger for 
brain injured than for control sub- 
jects; the thresholds were 11 and 5 
db., respectively. Subjects with uni- 
lateral lesions in the right hemisphere 
required greater intensity on the left 
than on the right side in order to 
judge the sound at the midline, and 
conversely for subjects with left uni- 
lateral lesions. (This is similar to the 
results found by Luciani and Sep- 
pilli—1886—with unilateral abla- 
tions.) No such directional charac- 
teristic was found in the impairment 
of judgments involving dichotic time. 
The subjects who were impaired with 
respect to thresholds for dichotic 
time were not necessarily impaired 
with respect to thresholds for di- 
chotic intensity, and conversely. This 
suggested that the neural mecha- 
nisms for localization based on these 
two cues might not be identical. 


MARK R. ROSENZWEIG 


Coleman (1959) recorded electrical 
responses from several positions on 
the auditory cortex of anaesthetized 
cats while either moving a click 
source around the animal’s head or 
varying dichotic time and intensity 
of clicks produced at the two ears. 
The relative amplitudes of responses 
at different electrode positions varied 
with the location of the sound source 
or with the dichotic conditions. Some 
points gave larger responses to con- 
tralateral and some to ipsilateral 
stimuli. ‘‘These data suggest that 
angular location of auditory stimuli 
may be represented in the auditory 
cortex of one hemisphere by means of 
a place principle” (p. 40). 

These observations are hard to rec- 
oncile with the results of experiments 
in which electrical stimuli were ap- 
plied to the auditory area of one 
cerebral hemisphere (Ferrier, 1890; 
Penfield & Rasmussen, 1950). It will 
be remembered that human subjects, 
under these conditions, usually hear 
a sound contralaterally to the side 
stimulated; sometimes they hear the 
sound on both sides, but never do 
they hear it ipsilaterally to the 
stimulation. This suggests that it is 
not possible to excite points in the 
right hemisphere that represent loca- 
tions in space to the right of the 
listener, nor points in the left hemi- 
sphere that represent locations to the 
left of the listener. 

On the basis of his findings Cole- 
man was inclined to reject what he 
termed “the bilateral ratio theory” 
of Rosenzweig. No sure conclusion 
concerning ability to localize can be 
drawn from Coleman’s observations, 
for the position of a sound source 15 
also represented at the olivary nu- 
cleus, yet a cat cannot localize accu- 
rately using only the lower brain 
centers. Should Walsh’s conclusion 
be substantiated that the cortex of 
one hemisphere suffices for norma 
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auditory localization, then Coleman's 
finding may take on increased im- 
portance. 

Formulation of a comprehensive 
hypothesis about the mechanisms of 
binaural localization requires an 
answer to the question whether the 
cortex of a single hemisphere is suffi- 
cient for normally accurate localiza- 
tion. This question may be resolved 
by precise determination of the 
capacity to localize in patients or 
animals with complete unilateral de- 
struction of the auditory cortex. 


SUMMARY AND CONCLUSION 


From the work of Venturi in 1800 
until about 1920, the perception of 
location of a sound source was gen- 
erally considered to be a judgment 
arrived at by comparing differences 
in the stimulation at the two ears. 
While Venturi showed that monaural 
localization was possible, if the 
listener could move his head during 
the presentation of the sound, the 
chief interest has always been in 
binaural localization. Over most of 
this period, dichotic difference in in- 
tensity was considered to be the only 
or the main stimulus basis for local- 
ization. Comparison of dichotic in- 
tensities seemed to be a plausible ex- 
planation for localization, even 
though Alison pointed out in 1858 
that a sound nearer one ear is heard 
at that side only and seems to be 


the intensity hypothesis reigned, 
there was little incentive and little 
effort to work out the physiological 
mechanisms of auditory localization. 
This was true in spite of noteworthy 
advances in knowledge of the anat- 
omy of the afferent auditory system, 
beginning in the 1870s. 

The establishment of the dichotic 
time hypothesis at the end of the 
first world war was quickly followed 
by abandonment of the judgmental 
position. This position could no 
longer be maintained when it was 
realized that the time intervals on 
which localization is based are too 
small to be perceived as intervals; 
only a single localized sound is heard. 
Psychologists soon proposed a num- 
ber of speculative mechanisms for 
localization, involving interaction of 
neural impulses converging from the 
two ears upon some central locus. 
During the last 25 years a number of 
experimenters have brought ablation 
and electrophysiological techniques 
to bear on the problems of localiza- 
tion. They have recently shown that 
the cortex is required for binaural 
localization, although neural inter- 
action first occurs low in the brain 
stem. Some evidence suggests that 
the cortex of a single hemisphere may 
be sufficient to permit localization. 
A completely satisfactory hypothesis 
of the mechanisms of binaural audi- 
tory localization, including both cor- 
tical and subcortical components, 1S 


suppressed in the other ear. While A 
the judgmental interpretation and yet to be presented. 
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PHYSIOLOGICAL EFFECTS OF “HYPNOSIS” 
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This paper reviews two series of 
investigations: one series indicating 
that sensory, circulatory, gastro- 
intestinal, and cutaneous functions 
can be altered by means of “hypno- 
sis’’; and a second series indicating 
that similar physiological effects can 
be produced by symbolic stimulation 
without “hypnosis.” 


Sensory ALTERATIONS INDUCED 
BY Hypnotic STIMULATION? 


“Hypnotic Color-Blindness” 


To induce “color-blindness” in six 
“trained” hypnotic subjects (Ss), 
Erickson (1939) employed a com- 
plex procedure which included the 
following: gradual induction of a 
“profound somnambulistic hypnotic 
trance”; slow, gradual induction of 
“total blindness”; awakening of the S 
in the “blind” condition so that he 
would experience distress and anxi- 
ety; induction of a second “trance” 
condition; explanations to the S that 
vision would be restored but that a 
certain color or colors would not be 


‘ The author is indebted to Louis B. Glass, 
Alejandro D. Paniagua, David R. Evans, and 
Harry Freeman for critically reading the 
manuscript. A condensed version of the 
paper was presented at the University of 
Kansas seventh Annual Institute of Research 
in Clinical Psychology, May 10, 1960. 

Research repor E author and his 
associates was suppo! in a t 
from the National Asuicatings a Mental 
Health and in part by Research Grant 
MY3253, from the National Institute of 
Mental Health, United States Public Health 

Service. 

2 Since experimental and clinical studies of 
“hypnotic analgesia” have been recently re- 
viewed elsewhere (Barber, 1959, 1960a), this 
phenomenon is not included in the following 
discussion. 
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detectable; suggestions of amnesia 
for the critical color or colors; ad- 
ministration of the Ishihara during 
suggested (green, red, red-green, and 
total) color-blindness; administration 
of the Ishihara without suggested 
color-blindness in the waking state 
and in “the simple trance state.” 
The results of this involved experi- 
ment (which included 13 separate 
administrations of the Ishihara to 
each S) appeared to be as follows: all 
Ss had normal color vision during the 
waking state and in “the simple 
trance state’’; during suggested color- 
blindness, the numerals on some of 
the Ishihara cards were read in the 
manner characteristic of the green, 
red, red-green, or total color-blind. 
Erickson concluded that the hyp- 
notic procedure was effective in in- 
ducing “consistent deficiencies in 
color vision comparable in degree and 
character with those found in actual 
color blindness.” However, Grether 
(1940) criticized this conclusion not- 
ing that (a) “red-green color-blind- 
ness” does not exist in nature (this is 
a generic term referring to symptoms 
common to red-blindness and green- 
blindness); and (b) the deficiencies in 
color vision found among persons 
with actual red-blindness, green- 
blindness, or total color-blindness 
are “quite different” from those 
which Erickson attempted to induce. 
Harriman (1942) repeated part of 
Erickson’s procedure, suggesting am- 
nesia for red and then for green to 10 
“deeply hypnotized” Ss; although 
these Ss responded to the Ishihara in 
a manner similar to Erickson’s SS, 
Harriman concluded, in accordance 
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with Grether's critique, that the 
alterations induced “resemble atti- 
tudinal changes more closely than 
they resemble profound changes in 
sensory content.” However, no at- 
tempt was made to determine if the 
lengthy and involved hypnotic pro- 
cedure employed in the investigation 
was actually necessary to induce such 
“visual anomalies.” 

Barber and Deeley (1961) hypoth- 
esized that normal Ss, instructed to 
remain inattentive to red or green, 
give similar responses to the Ishihara 
as “hypnotic color-blind” Ss, As a 
preliminary test of color vision, the 
American Optical Company Pseudo- 
Isochromatic Plates were admin- 
istered to 10 normal Ss, The S was 
then presented with the Ishihara 
plates and instructed as follows: 
“Now look at these cards. As I pre- 
sent each card, try as hard as you 
possibly can to pay no attention to 
the red. Look carefully at the rest of 
the card, but ignore the red; just 
don’t let yourself see it.” After com- 
pleting this task the Ishihara cards 
were presented again and similar 
instructions were given to “try as 
hard as you possibly can to pay no 
attention to the green.” Finally, the 
S was instructed to report what he 
naturally saw on the Ishihara plates. 
The results were as follows: (a) The 
responses to the Pseudo-Isochro- 
matic Plates and to the final admin- 
istration of the Ishihara indicated 
normal color vision in all Ss. (b) When 
instructed to “pay no attention” to 
red and then to green, 92 of 320 
(28.8%) of the total responses of the 
10 normal Ss were similar to the 
responses expected from persons with 
natural red-blindness or green-blind- 
ness. Of the 320 responses given to 
the Ishihara by Harriman's 10 
“deeply hypnotized” Ss during Sug- 
gested red-blindness and green-blind- 
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ness, 85 (or 26.69%) were similar to 
responses expected from the red- 
blind or green-blind. In brief, this 
experiment appears to indicate that 


it 


structed to concentrate away 
red or green give similar 
on the Ishihara as “deeply 
tized” Ss who have been 
elaborate suggestions to induce 
blindness. 
“Hypnotic Blindness” 

Are hypnotic suggestions 
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gators demonstrated that the magni- 
tude of the photically evoked poten- 
tials was consistently reduced 
whenever “the attention of the sub- 
ject was distracted,” e.g., when in- 
structed to solve a difficult arith- 
metic problem mentally or when 
asked to recall an interesting experi- 
ence. From these experiments and 
from a series of related studies by 
other workers summarized in the 
paper, the authors suggest that during 


“voluntary attention” as well as by sugges- 
tion, transmission of photic impulses is modi- 
fied at the retina by centrifugal influences. 
These influences, acting during wakefulness, 
are probably related to organized activity of 
the reticular formation of the brain stem 
under the control of the cortex (p. 394). 


In earlier studies, Dorcus (1937), 
Lundholm and Lowenbach (1942), 
and other workers had noted that the 
pupillary reaction to light stimula- 
tion is not altered during “hypnotic 
blindness.” However, since pupillary 
constriction to light is found during 
some types of organic blindness (e.g., 
bilateral destruction of the occipital 
visual areas—Madow, 1958), this re- 
sponse is not a satisfactory index of 
blindness and workers in this area 
have generally focused on an osten- 
sibly more satisfactory response— 
alpha blocking on the electroen- 
cephalogram (EEG). 

Alpha blocking to photic stimula- 
tion appears to be a totally involun- 
tary response which is almost always 
present in normal persons and never 
present in the blind. A series of in- 
vestigations has demonstrated that 
(a) when the room is darkened and 
the eyes are closed, most normal 
persons typically show an alpha 
rhythm on the EEG (consisting of 

waves with a frequency of 8 to 13 
cycles per second and an amplitude 
of about 50 microvolts); (b) a light 
flashed into the closed eyes of these 
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individuals is almost always effective 
in causing “alpha block” or “alpha 
desynchronization” (i.e., in replacing 
the alpha rhythm with small fast 
waves) within 0.4 second (Jasper & 
Carmichael, 1935); and (c) persons 
with total blindness of neurological 
origin do not show alpha blocking 
under these conditions (Callahan & 
Redlich, 1946). 

Lundholm and Lowenbach (1942), 
Barker and Burgwin (1948), and 
Ford and Yeager (1948) found that 
hypnotic suggestions of blindness did 
not prevent alpha blocking when the 
Ss opened their eyes in an illumi- 
nated room. However, these experi- 
ments are based on a methodological 
error: In normal persons the act of 
opening the eyes per se—whether in 
darkness or in an illuminated room— 
almost invariably results in alpha 
desynchronization (Loomis, Harvey, 
& Hobart, 1936; Yeager & Larsen, 
1957). To determine if hypnotic 
suggestions of blindness are effective 
in preventing the alpha desynchroni- 
zation which normally occurs after 
visual stimulation, it is therefore 
necessary for the S either to keep his 
eyes continuously open or continu- 
ously closed during the experiment. 
These conditions have been met in 
three investigations. Loomis et al. 
(1936) demonstrated that when total 
blindness was suggested to an excel- 
lent hypnotic S whose eyes were kept 
Open continuously with adhesive 
tape, the alpha rhythm did not show 
desynchronization during photic 
stimulation. This was repeated 16 
times with the same results; whether 
the room was illuminated or darkened 
made no difference whatsoever—the 
alpha rhythm was continuously pres- 
ent until the S was told that he could 
once again see. In a subsequent ex- 
periment, Schwarz, Bickford, and 
Rasmussen (1955) found that after 


suggestions of blindness 7 of 11 
hypnotic Ss (with eyes taped open) 
showed occasional alpha waves when 
the room was illuminated. In a more 
recent study, Yeager and Larsen 
(1957) instructed five Ss to keep 
their eyes continuously closed during 
the experiment. Hypnotic and post- 
hypnotic suggestions were given that 
the S would not be aware of the light 
stimulation. In the majority of trials, 
no alpha blocking occurred when 
light fell upon the closed eyes. 

The above studies indicate that 

hypnotic suggestions of blindness are 
at times effective in eliminating an 
involuntary physiological response 
which normally follows visual stimu- 
lation, viz., alpha blocking on the 
electroencephalogram. However, a 
similar effect can be demonstrated in 
Ss who have not been given an “hyp- 
notic induction” and who do not 
appear to be in “the trance state.” 
Loomis et al. (1936) found that when 
a uniformly illuminated bowl was 
placed over the eyes of a normal 
person who was instructed not to 
focus on any specific part of the light 
pattern, the alpha waves appeared 
fairly regularly. Gerard (1951) 
writes: 
With a little practice I can look directly at a 
100-watt light . . . and, by deliberately pay- 
ing no attention to it, I can have my alpha 
Waves remain perfectly intact; then with no 
change except what I can describe in no other 
way than as directing my attention to the 
light, have them immediately disappear (p. 
94). (Quoted by permission of John Wiley & 
Sons.) 


Jasper and Cruikshank (1937) have 
published similar findings. In brief, 
although some “hypnotized” Ss, who 
have been given suggestions of blind- 
fess, continue to show an occipital 
_ alpha rhythm part of the time or all 
of the time when stimulated by light, 
= a similar effect can be demonstrated 
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in normal persons who are instructed 
to “pay no attention” to visual 
stimuli. 

In a recent study Schwarz et al. 
(1955) found that five “hypnotized” 
Ss who had been given suggestions of 
blindness did not show eye move- 
ments when urged to look at an ob- 
ject. The restriction of eye move- 
ments was indicated both by electro- 
myographic eye leads and by the 
marked suppression of lambda waves 
on the EEG. These investigators 
suggest that the restriction of eye 
movements during hypnotic blind- 
ness “is an attempt to shut off all 
alerting stimuli that might interfere 
with the successful accomplishment 
of thesuggestion.”’ Along similar lines, 
Barber (1958b) presented evidence 
indicating that seven somnambu- 
listic hypnotic Ss deliberately refused 
to look at an object which they had 
been told that they could not see; 
observation of eye movements indi- 
cated that they typically focused on 
all parts of the room except where the 
object was situated. When inter- 
viewed after the experiment, most of 
the Ss readily admitted that they 
purposely refused to carry out the ac- 
tive process of turning the head and 
focusing the eyes on the object, e.g., 
“IĮ was almost carefully not looking 
at it,” “I kept looking around it or 
not on it.” 

In an earlier study, Pattie (1935) 
gave five good hypnotic Ss the sug- 
gestion that they were blind in one 
eye. Four responded to a series of vis- 
ual tests (stereoscope, perimetry, fil- 
ters, Flees’ box, plotting the blind 
spot, opthalmological examination) 
with normal vision in both eyes; how- 
ever, one S responded to all tests as if 
she were actually blind in one eye. In 
a second experiment the “blind” S 
was given a more complicated filter 
test; the results indicated that the 
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“blind” eye was not impaired to the 
slightest degree and Pattie concluded 
that the “former tests were thus in- 
validated.” When questioned in a 
subsequent hypnotic session, the S 
revealed, after much resistance, that 
she had given a convincing demon- 
stration of uniocular blindness be- 
cause of the following: during the 
stereoscopic test the two images were 
separated a second after exposure and 
this gave her the necessary knowl- 
edge to fake the test; she had prac- 
ticed determining the blind spot at 
home after the experimenter had first 
attempted to plot it; on the Flees’ 
box with crossed images she “saw 
there were mirrors in there and fig- 
ured somehow that the one on the 
left was supposed to be seen with the 
right eye,” etc. 

The above studies appear to indi- 
cate that the “good” hypnotic S, who 
has been given suggestions of blind- 
ness, purposely attempts to inhibit 
responses to visual stimuli. This sug- 
gests the following hypothesis which 
can be easily confirmed or disproved: 
The responses to photic stimulation 
which characterize “deeply hypno- 
tized" Ss who have been given sug- 
gestions of blindness can be dupli- 
cated by normal persons who are 
asked to remain inattentive and un- 
responsive to visual stimuli. 
“Hypnotic Deafness” 

Can significant alterations in audi- 
tory functions be demonstrated in the 
hypnotized person following sugges- 
tions of deafness? Fisher (1932) and 
Erickson (1938b) approached this 
question by investigating the effect of 
hypnotically-induced “deafness” on 
conditioned responses to acoustic 
stimuli. Fisher found that during 
posthypnotic deafness one S did not 

show a patellar response which had 
been conditioned to the sound of a 
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bell; Erickson similarly demonstrated 
that after hypnotic suggestions of 
deafness two Ss failed to show 
a hand-withdrawal response condi- 
tioned to the sound of a buzzer. Al- 
though both investigators interpret 
the failure to show conditioned re- 
sponses to auditory stimuli as a sign 
of deafness, earlier experiments, re- 
viewed by Hilgard and Marquis 
(1940, p. 35, pp. 269-270), which in- 
dicate that such conditioned re- 
sponses can be voluntarily inhibited, 
suggest a second interpretation, 
namely, that the “hypnotic deaf” Ss 
perceived the sound stimulus but 
purposely inhibited the response. 
Some support for this interpretation 
is offered by the kymographic trac- 
ings reproduced in Fisher's paper 
which show an aborted patellar re- 
sponse to some of the sound stimuli. 
Additional evidence is presented by 
Lundholm (1928) who, like Erickson, 
conditioned a hand-withdrawal re- 
sponse to an auditory stimulus; al- 
though the S in this case did not 
show the conditioned response after 
hypnotic suggestions of deafness, he 
later admitted “having heard the 
click, having felt an impulse to 
withdraw on click without shock, 
and having resisted and inhibited 
that impulse” (p. 340). 

As an additional index of deafness, 
Erickson (1938a) noted that his Ss 
did not show startle responses to sud- 
den loud sounds. Other investiga- 
tions, however, again suggest the 
possibility that the Ss may have per- 
ceived the sound and purposely in- 
hibited the startle response; for in- 
stance, Dynes (1932) reported that 
three “hypnotic deaf” Ss, who did 
not show overt startle responses when 
a pistol was unexpectedly fired, ad- 
mitted after the experiment that 
they heard the sound, and Kline, 
Guze, and Haggerty (1954) demon- 
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strated that a “hypnotic deaf” S who 
failed to show both conditioned re- 
sponses to auditory stimuli and 
startle reflexes to sudden loud sounds 
showed clear-cut responses to audi- 
tory stimuli when tested by a method 
employing delayed speech feedback. 
The latter experiment merits fur- 
ther comment. In the normal per- 
son, feeding back his speech through 
tape recording amplification and ear- 
phones with a delay of one-quarter 
second has been reported to produce 
an impairment in subsequent speech. 
Most commonly this speech disturb- 
ance involves stammering, stutter- 
ing, perseveration, and marked loss 
in speed and tempo. Kline et al. 
(1954) found that such delayed 
speech feedback produced distinct 
impairment in speech performance in 
an excellent hypnotic S who had been 
given suggestions of deafness. How- 
ever, as compared with his “waking” 
performance, the S showed less slur- 
ring, stuttering, and stammering, ap- 
peared more calm, and did not show 
discomfort. The investigators con- 
cluded that the hypnotic suggestions 
of deafness were effective in inducing 
a “set,” or in “gearing” the S, “to 
give minimal response to the excruci- 
ating intensity and the constant in- 
terference of the feed-back of his own 
voice” without in any way inducing 
“deafness in the usual sense.” How- 
ever, no attempt was made to deter- 
mine if the S would have shown a 
similar ability to tolerate the speech- 
disturbing stimulation during the 
“waking” experiment if he had been 
carefully instructed and motivated 
to remain inattentive to or to ‘‘con- 
centrate away from” the stimula- 
tion. Further experiments are re- 
quired to determine if normal per- 
sons are able to duplicate the behav- 
ior of this “deeply hypnotized” S 
when instructed in this manner, 


Malmo, Boag, and Raginsky (1954) 
have reported comparable findings. 
After appropriate suggestions to in- 
duce deafness, two somnambulistic 
Ss denied auditory sensations and 
showed significantly reduced motor 
reactions to sudden auditory stimu- 
lation; however, myographic record- 
ings from eye muscles showed a 
strong blink reaction in both Ss at 
each presentation of the auditory 
stimulus. Sternomastoid tracings in- 
dicated that one S showed slight 
startle responses to all stimuli and 
the other S showed a strong startle 
reaction to the first presentation of 
the stimulus and slight startle reac- 
tions to subsequent stimuli. Other 
data presented in the report (e.g., in- 
trospective reports and myographic 
tracings indicating a higher level of 
tension in the chin muscles under 
hypnosis as compared to the control 
condition) permit the following in- 
terpretation of the findings: (a) the 
Ss were unable to inhibit blink re- 
sponses to the auditory stimuli; (6) 
since the first presentation of 
auditory stimulus was more or less 
unexpected, one S failed to inhibit 
the startle response; (c) since the 
second and subsequent stimuli were 
expected, both Ss were able, to a 
great extent, to inhibit startle re- 
sponses. In an earlier study Malmo 
and his collaborators (Malmo, Davis, 
& Barza, 1952) found oe when un- 
expectedly presented with an intense 
suas stimulus, a hysterical “deaf” 
patient also showed a gross startle 
response on the myograph; a control 
case of middle-ear deafness, studied 
by the same techniques, showed no 
blink reaction and no startle response 
to any presentation of the auditory 
stimulus. 

In an earlier study Pattie (1950) 
gave four somnambulistic hypnotic 
Ss suggestions of unilateral deafness. 
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The Ss appeared to accept the sug- 
gestions, insisting that they could 
not hear in one ear. However, when 
auditory stimuli were presented in 
such a manner that they could not 
determine which ear was being stim- 
ulated, they showed normal hearing 
in both ears. 

The above findings—that “hyp- 
notic deaf" Ss purposely inhibit con- 
ditioned responses to auditory stimuli 
(Lundholm), appear to inhibit startle 
responses to sudden acoustic stimuli 
(Malmo et al., 1952), show a calmer 
attitude and less tension during 
speech-disturbing auditory stimula- 
tion but no sign of actual deafness 
(Kline et al., 1954), and do not show 
“deafness” in one ear when unable to 
determine which ear is being stimu- 
lated (Pattie, 1950)—suggest a sim- 
ilar hypothesis as the studies of “hyp- 
notic blindness” reviewed in the 
preceding section of this paper: if 
carefully instructed and motivated to 
“concentrate away from” auditory 
stimulation, normal persons show 
similar responses to acoustic stimuli 
as “hypnotic deaf" Ss. 


Tue EFFECT or Hypnotic STIMULA- 
TION ON CIRCULATORY FUNCTIONS 


Effect of Hypnotic Stimulation on 
Vasomotor Functions 
The evidence at present indicates 
that localized vasoconstriction and 
vasodilation (and a concomitant lo- 
calized skin temperature alteration) 
can be induced in some hypnotized 
persons by appropriate verbal stimu- 
lation. McDowell (1959) found that 
a good hypnotic S showed erythema 
with vasodilation and increase in 
skin temperature of the right leg fol- 
lowing suggestions that the leg was 
immersed in warm water. In a care- 
ful experiment, Chapman, Goodell, 
and Wolff (1959) suggested to 13 Ss 
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“as soon as a state of moderate to 
deep hypnosis had been established,” 
that one arm was either “normal” or 
that it was numb, wooden, and de- 
void of sensation (‘‘anesthetic’’). 
The arm was then exposed on three 
spots, blackened with India ink, to a 
standard thermal stimulus (500 milli- 
calories/second/centimeter* for 3 sec- 
onds). After an interval of 15 to 30 
minutes “during which [time] hyp- 
nosis was continued,” it was sug- 
gested that the other arm was tender, 
painful, burning, damaged, and ex- 
ceedingly sensitive (“vulnerable”) 
and the same standard noxious stim- 
ulation was applied. The results of 40 
experiments with the 13 Ss were as 
follows: In 30 experiments the in- 
flammatory reaction and tissue dam- 
age following the noxious stimula- 
tion was greater in the “vulnerable” 
arm, in 2 experiments the reaction 
was greater in the “anesthetic” arm, 
and in 8 experiments no difference 
was noted. Plethysmographic and 
skin temperature recordings indi- 
cated that following the noxious stim- 
lation local vasodilation and eleva- 
tion in skin temperature was larger 
in magnitude and persisted longer in 
the “vulnerable” arm. This experi- 
ment should be repeated with un- 
hypnotized Ss who are instructed to 
imagine one arm as “devoid of sensa- 
tion” and the other arm as ‘‘exceed- 
ingly sensitive.” The data summar- 
ized below suggest that at least some 
of the effects reported in this study 
—localized vasodilation and eleva- 
tion in skin temperature—can be in- 
duced by symbolic stimulation in 
some individuals who do not appear 
to be “in a state of moderate to deep 
hypnosis.” 

When attempting to condition local 
vasoconstriction and vasodilation to 
verbal stimuli, Menzies (1941) found 
that the conditioning procedure could 
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be dispensed with in some cases; 
some persons, who had not partici- 
pated in the experimental condition- 
ing, showed vasodilation in a limb 
when recalling previous experiences 
involving warmth of the limb and 
local vasoconstriction when recall- 
ing experiences involving cold. In an 
earlier study, Hadfield (1920) found 
that localized changes in skin tem- 
perature could be induced by sugges- 
tions given to a person “in the wak- 
ing state.” In this case, the S had 
exercised vigorously before the ex- 
periment and the temperature of 
both hands, as measured with the 
bulb of the thermometer held firmly 
in the palm, had reached 95°F. 
Without a preliminary hypnotic pro- 
cedure, it was suggested that the 
right arm was becoming cold. Within 
half an hour the temperature of the 
right palm fell to 68° while the tem- 
perature of the left palm remained 
at 94°. When subsequently given the 
suggestion that the right hand was 
becoming warm, the temperature of 
the hand rose within 20 minutes to 
94°. Although this S had previously 
participated in hypnotic experiments, 
Hadfield insists that he did not “hyp- 
notize” him during this experiment 
and that the temperature alterations 
occurred when the S was “entirely in 
the waking condition.” 


Cardiac Acceleration Produced by 
Hypnotic Stimulation 


A number of experiments, reviewed 
by Gorton (1949) and Weitzenhoffer 
(1953), demonstrate that cardiac ac- 
celeration can be produced by hyp- 
notic suggestions which activate the 
S and that cardiac deceleration can 
be produced by hypnotic suggestions 
of relaxation, drowsiness, and sleep; 
however, this finding indicates no 
more and no less than that an altera- 
tion in the “level of arousal” or 
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“level of activation” (Duffy, 1957; 
Woodworth & Schlosberg, 1954)— 
whether induced by stimuli present 
during various ongoing life situations 
or induced by symbolic stimulation 
during a hypnotic experiment—is 
correlated with an alteration in the 
heart rate. A more significant ques- 
tion is: Can the heart rate be accel- 
erated or depressed by direct sugges- 
tions of such an effect without simul- 
taneously inducing anxiety, emo- 
tion, or arousal? Solovey and 
Milechnin (1957) demonstrated an 
accelerated pulse rate in 2 out of 23 
hypnotic Ss following the direct sug- 
gestion “Your heart is beating more 
rapidly.” However, the possibility 
is not excluded that the cardiac ac- 
celeration in the two cases was due to 
emotion or anxiety evoked by the 
suggestions; one S$ later reported 
that, when given the suggestion, he 
imagined himself looking down from 
a height and feeling someone push- 
ing him on the shoulder and the 
other S stated that, when given the 
suggestion, he had a feeling of dis- 
tress. Since relatively large changes 
in cardiac rate can be demonstrated 
during alterations in the rate and 
depth of respiration (Huttenlocher 
& Westcott, 1957), it also appears 
plausible that the altered pulse rate 
in these cases may have been an in- 
direct effect of a change in respira- 
tory pattern. 

To demonstrate a direct effect of 
symbolic stimulation on heart rate it 
is necessary to control at least two 
factors, “level of arousal” and res- 
piratory rate. To the writer’s knowl- 
edge only one hypnotic experiment 
has been published which ostensibly 
satisfies these criteria: Van Pelt 
(1954) reported that a somnambulis- 
tic hypnotic S showed an accelerated 
cardiac rate following direct sugges- 
tions of such an effect without at the 
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same time showing an altered res- 
piratory rate or emotional arousal. 
After an “hypnotic induction” pro- 
cedure, this investigator spoke to the 
Sin a quiet tone as follows: “Your 
heart is beginning to beat faster. It 
is getting faster and faster. You are 
perfectly calm, but your heart is 
beating faster and faster.” The elec- 
trocardiogram (EKG) showed that 
the heart rate increased immediately 
from 78 to 135 beats per minute. Al- 
though Van Pelt states that he did 
not observe a change in the depth 
and rate of respiration during the 
tachycardia, it appears possible that 
an altered respiratory pattern could 
have been demonstrated if a pneu- 
mograph had been employed in the 
study. However, during the accelera- 
tion the S appeared calm and the 
EKG tracing did not show somatic 
tremors which are typical of nervous- 
ness and fear. In a second experi- 
ment, in which the same S showed 
cardiac acceleration following sugges- 
tions intended to arouse fear—“You 
are driving a car at a tremendous 
speed and are heading toward a sec- 
ond car and are going to crash’— 
the EKG recording showed clear evi- 
dence of somatic tremors. 

The above study lacks a crucial 
control; no attempt was made to de- 
termine if the S could voluntarily 
accelerate the heart without “hyp- 
nosis.” „Since other workers employ- 
ing similar procedures with equally 
“good” hypnotic Ss have failed to 
demonstrate cardiac acceleration 
(e.g., Jenness & Wible, 1937, failed 
in 30 attempts with eight somnam- 
bulists), it appears plausible that the 
hypnotic procedure was not a neces- 
sary factor in producing this effect. 
Supporting evidence for this supposi- 
tion is presented in a series of studies 
(ca. 20) which demonstrate that 

some apparently normal persons are 
able to accelerate the heart volun- 
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tarily (King, 1920). In most of the 
reported cases the voluntary tachy- 
cardia was on the order of 15 to 40 
beats per minute; however, in some 
cases (Favell & White, 1917; Tar- 
chanoff, 1885) the acceleration was 
as high as 63 or 75 beats per minute, 
In all cases the Ss denied that they 
induced the tachycardia by visualiz- 
ing emotion inducing situations and 
insisted that they produced the effect 
by voluntary effort. Some Ss showed 
changes in respiratory pattern dur- 
ing the voluntary tachycardia but 
in these cases the respiratory altera- 
tions varied and could not be cor- 
related with the change in heart rate 
(Koehler, 1914; Pease, 18°9; Tar- 
chanoff, 1885; Van de Velde, 1897); 
other Ss could as readily induce the 
voluntary acceleration when breath- 
ing more or less normally as when 
showing changes in respiratory pat- 
tern (King, 1920; Taylor & Cameron, 
1922); and some Ss showed no sig- 
nificant change in respiratory pat- 
tern on the pneumograph when in- 
ducing cardiac acceleration on the 
order of 40 beats per minute (Favill 
& White, 1917). 

Voluntary acceleration of the heart 
may not be as uncommon as is gen- 
erally assumed: Van de Velde found 
four cases and Tarchanoff five cases 
when confining their search to rela- 
tively small groups of individuals; a 
number of medical students discov- 
ered that they possessed this ability 
in physiology classes when they at- 
tempted to determine the validity of 
the lecturer’s assertion that volun- 
tary cardiac acceleration is not im- 
possible (Ogden & Shock, 1939; West 
& Savage, 1918). 


Cardiac Standstill Induced by 
“Hypnosis” 


Raginsky (1959) demonstrated 
that hypnotic suggestions are effec- 
tive in producing cardiac block for a 
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brief period in an appropriately pre- 
disposed person. The S in this case 
was a hospitalized patient whose car- 
otid sinuses had been surgically re- 
moved because of periodic fainting 
episodes with cardiac arrest (Adams- 
Stokes disease). After the patient 
“went into a medium to deep hyp- 
notic state,” he was instructed “in a 
tone of considerable urgency to vis- 
ualize with all clarity possible his 
worst attack of faintness." The pa- 
tient “turned pale, limp, and a cold 
perspiration appeared on his fore- 
head. His pulse was unobtain- 
able... .” The EKG tracing showed 
complete auricular and ventricular 
standstill for a time interval of four 
beats, followed by a normal sinu- 
auricular beat. After a rest period of 
10 minutes, the experiment was re- 
peated with comparable results. How- 
ever, no attempt was made to de- 
termine if cardiac arrest could be in- 
_ duced in this patient by asking him 
_ to visualize his worst attack of faint- 
ness without a preceding “hypnotic 
induction” procedure. The case sum- 
marized below suggests that the 
“hypnotic induction” and the “me- 
dium to deep hypnotic state” may 
have been unnecessary in producing 
this effect. 

McClure (1959) found that an ap- 
Propriately predisposed person could 
voluntarily produce cardiac stand- 
still. The S in this case, a 44-year- 
old airplane mechanic, had discovered 
that he could induce a progressive 
slowing of the pulse by relaxing com- 
pletely. When asked to induce a dim- 
inution of the heart rate in the labora- 
tory, the S lay very quitely and 
allowed respiration to become ex- 
tremely shallow. The EKG showed 
Sinus arrest for a period of 5 seconds. 
An EKG tracing taken 1 hour after 
the experiment was within normal 
limits. Since the S had rheumatic 
fever at age 7, McClure suggests the 


following tentative explanation of 
this performance: 
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Electrocardiographic Alterations 
Induced by Hypnotic Stimulation 


Bennett and Scott (1949) found 
that one of five excellent hypnotic Ss 
showed tachycardia and T wave ab- 
normalities on the EKG—lowering 
or disappearance of T in Leads I, H, 
and I1l—within 2 minutes following 
suggestions intended to induce anxi- 
ety and anger. The Sin this case was 
an emotionally stable and well-ad- 
justed young male with no history of 
cardiac disorders and with an other- 
wise normal EKG. Since such EKG 
abnormalities are not normally asso- 
ciated with tachycardia, two electro- 
cardiographers, who were not in- 
formed of the experimental condi- 
tions under which the tracings were 
made, interpreted the records as in- 
dicating coronary artery disease or 
acute rheumatic fever. Finding in a 
subsequent study with the same S 
that subcutaneous administration of 
epinephrine elicited lower T waves 
in Leads I and II than those found 
during the control experiment, the 
authors suggest that the EKG altera- 
tions induced during the hypnotic 
experiment may have been an in- 
direct result of sympathetic stimula- 
tion and release of epinephrine from 
the adrenal medulla, Berman, Si- 
monson, and Heron (1954) confirmed 
this study; employing 14 susceptible 
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hypnotic Ss with otherwise normal 
EKG, they found that during hyp- 
notically-induced fear and anxiety, 
two showed elevation and five showed 
depression or inversion of T waves. 
In a second experiment these workers 
found that although “deep hypnosis” 
could not be induced in 11 patients 
with coronary sclerosis and angina 
pectoris, four showed T wave changes 
when given emotion inducing sugges- 
tions. 

The above experiments demon- 
strate that EKG alterations resem- 
bling those found in grave cardiac 
disorders can be induced in some 
“hypnotized” Ss by suggestions 
which evoke fear, anger, or anxiety; 
however, similar EKG abnormalities 
have been demonstrated in some 
normal persons during emotional 
arousal. Mainzer and Krause (1940) 
compared the EKG tracings of 53 
unselected surgical patients recorded 
the day before surgery, and on the 
operating table immediately before 
the induction of general anesthesia. 
As compared with the earlier trac- 
ings, 40% of the tracings recorded 
immediately before surgery showed 
various abnormalities such as S-T de- 
pression with T low, inverted, or ab- 
sent. Along similar lines, Landis and 
Slight (1929) and Loftus, Gold, and 
Diethelm (1945) demonstrated that 
some persons with otherwise normal 
EKG show abnormalities of the ST 
segment and the T wave during 
startle or anxiety; Crede, Chivers, 
and Shapiro (1951) found that in 
rare cases mere anticipation of the 
EKG test is sufficient to produce in- 
verted T waves in normal individ- 
uals; and Ljung (1949) published a 
study of 14 Ss with no evidence of 
cardiac disease who showed abnor- 
mal T waves during apparently 
slight emotional stimulation. After 
summarizing these and related in- 
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vestigations, Weiss (1956) suggests 
that such EKG effects are found dur- 
ing emotional stimulation in persons 
who are prone to show an elevation 
of sympathetic tone and an increase 
in cardiac metabolism without a cor- 
responding increase in the coronary 
circulation. 

In brief, the above studies on car- 
diac functions indicate the followin 

1. In very rare cases, it is possible 
to produce cardiac acceleration or 
complete stoppage of the heart action 
by appropriate hypnotic stimulation; 
however, in very rare cases, similar 
effects can be voluntarily produced 
by unhypnotized persons. 

2. Although some hypnotized per- 
sons show EKG alterations resem- 
bling those found in organic heart 
disease following suggestions designed 
to induce fear, anxiety, or anger, 
some persons who have not been given 
an “hypnotic induction” and who do 
not appear fo be “in trance” show 
similar EKG alterations during emo- 
tional stimulation. ‘ 


EFFECT OF Hypnotic STIMULATION 
ON METABOLIC AND GASTROINTES- 
TINAL FUNCTIONS 
Effect of Hypnotic Stimulation on 
Blood Glucose Levels 
A number of experiments appear to 


bA 


indicate that hypnotized persons 


show an elevation of blood glucose 


levels when given the direct sugg 
tion that blood sugar will increase. 
Before discussing these studies, it is 
appropriate to note the following: 
1. The level of blood glucose ap- 
pears to be closely related to the 
level of “arousal”; blood sugar tends 
to increase during anxiety, emotion, 


or maintained activity and to de- 


crease during relaxation, depression, 
or sleep (Dunbar, 1954, Ch. 8). j 
2. The blood glucose level is €x- 


f 


cessively labile in diabetics, i.e as- if 
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compared with normal persons, dia- 
betics tend to show more extreme 
alterations in blood sugar content 
during periods of high or low 
“arousal” (Hinkle & Wolf, 1953; 
Mirsky, 1948). 

The above postulates suggest that 
in diabetic patients any procedure 
(hypnotic or nonhypnotic) which 
induces relaxation or minimizes ex- 
citability will tend to depress the 
blood sugar level and minimize gly- 
cosuria and any procedure which in- 
duces arousal or excitability will tend 
to elevate the blood glucose level and 
increase glycosuria. Data support- 
ing this hypothesis have been pre- 
sented by Bauch (1935) in a study 
of the effects of training in relaxation 
(Schultz's ‘‘autogene training”) on 
seven diabetic patients. Each pa- 
tient showed a significant decrease in 
blood sugar levels after becoming 
proficient in inducing relaxation— 


sinsulin dosage was reduced in each 


case by 10 to 20 units. Apparently, 
healthy persons do not show the same 
degree of reduction in blood glucose 
content after achieving the same suc- 
cess in producing relaxation (Schultz 
& Luthe, 1959). Along similar‘lines, 
Mohr (1925) relieved a ‘‘full-pledged 
diabetic” of glycosuria by hypnotic 
Suggestions which were effective in 
mitigating his “affective excitability” 
toward certain significant persons in 
his surroundings and was able to 
reinstate the glycosuria by suggest- 
ing that he would again be upset by 
these people. This experiment was 
repeated four times with the same re- 
sults. 

With the above findings in mind, 
the results reported in two hypnotic 
experiments become less mysterious. 
Gigon, Aigner, and Brauch (1926) 
found that blood sugar tended to be 
reduced in four hypnotized diabetic 
patients following suggestions of re- 


laxation and suggestions that “the 
pancreas would secrete insulin and 
that blood and urine sugar would 
markedly decrease.” Although the 
reduction in blood glucose in these 
cases may have been due to the sug- 
gestion that “the pancreas would 
secrete insulin,” it appears equally 
plausible that it was a secondary ef- 
fect of the suggestions of relaxation. 
Along similar lines, Stein (quoted by 
Dunbar, 1954, p. 291) reported that 
direct suggestions that blood sugar 
would decrease given to six hypno- 
tized diabetic patients resulted in re- 
duced blood sugar in 47 out of 56 at- 
tempts. Again, it appears plausible 
that the reduced blood glucose in 
these cases was an indirect result of 
the suggestions of relaxation given 
during the “hypnotic induction” pro- 
cedure. Supporting evidence for this 
supposition is presented in a second 
experiment by the same investigator; 
although Stein used only one diabetic 
patient in this study, he found that 
an “hypnotic induction” (apparently 
consisting of suggestions of quietude, 
relaxation, and drowsiness) resulted 
in a significant fall in blood glucose 
content without suggesting that the 
blood sugar would fall. n 

Is it possible to elevate the blood 
sugar level by suggesting to a non- 
diabetic hypnotic S that he is ingest- 
ing sugar? Marcus and Sahlgren 
(1925) found no rise in blood glucose 
content when four “deeply hypno- 
tized? nondiabetics were given a 
saccharin solution which they were 
told was a sugar solution. Similarly, 
Nielsen and Geert-Jorgensen (1928) 
found no elevation in the fasting 
blood sugar level when six excellent 
hypnotic Ss (nondiabetics) were given 
the suggestion that a glass of water 
contained large amounts of sugar. 
In contradistinction to the above, 
Povorinskij and Finne (1930) found 
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an elevated blood sugar content in 
two somnambulistic hypnotic Ss 
after inducing an hallucination of in- 
gesting sugar and honey; however, 
an elevation in blood glucose could 
be demonstrated in one of these Ss 
following similar suggestions given 
during “the waking state.” The data 
presented in the report do not exclude 
the possibility that the hypnotic sug- 
gestions which induced an “halluc- 
ination” of ingesting sugar and honey 
served to “arouse” the subjects or to 
induce emotional excitement. 


Effect of Hypnotic Stimulation on 
Gastric Functions 
The evidence indicates that stom- 
ach secretions, hunger contractions, 
and various other gastrointestinal 
functions can be influenced by ap- 
propriate suggestions given to a 
hypnotized person. Ikemi (1959) 
demonstrated that suggestions given 
during hypnosis of eating a delicious 
meal resulted in an increase in free 
acid, total acidity, and quantity of 
gastric secretions in 34 out of 36 
healthy young persons. In an earlier 
experiment, Heyer (1925) introduced 
a tube into the stomach of a “deeply 
hypnotized” S and removed the con- 
tents. If no secretion occurred 
within 10 minutes, the S was given 
the suggestion that he was ingesting 
either meat broth, bread, or milk and 
the gastric secretions were collected 
at 5-minute intervals and examined 
for quantity, acidity, and proteolytic 
activity. Each of the suggested meals 
evoked a secretion of approximately 
6 to 10 cubic centimeters of “gastric 
juice” within 10 to 15 minutes and 
the acidity and proteolytic activity 
appeared to vary with each food sug- 
gested. Delhougne and Hansen 
(1927) reported a similar study with 
one somnambulistic S. After the S 
was placed in “deep hypnosis,” the 
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stomach and duodenal secretions 
were aspirated by means of a Reh- 
fuss tube. Following this, the S was 
given the suggestion that he was in- 
gesting a meal which was rich in pro- 
tein (Schnitzel), rich in fat (a biscuit 
thickly covered with butter), or rich 
in carbohydrate (chocolate and 
marchpane). Each of the suggested 
meals evoked secretions of acid, pep- 
sin, and lipase from the stomach and 
of trypsin, lipase, and diastase from 
the pancreas. Although the authors 
do not analyze the data statistically, 
they conclude that the hallucinated 
meals were as effective as actual 
meals in eliciting specific secretions 
from the stomach and pancreas, e.g., 
the hallucinated protein meal sup- 
posedly induced a specific increase 
in the secretion of pepsin and trypsin, 
the hallucinated fatty meal sup- 
posedly induced a specific increase in 
the secretion of lipase. This startling 
conclusion, however, appears to be 
erroneous; a statistical analysis indi- 
cates that the quantity of each of the 
enzymes found after the three hallu- 
cinated meals was not significantly 
different. 

The above studies do not answer a 
crucial question: Was the “hypnotic 
induction” and the appearance of 
“deep trance” on the part of the Ss 
necessary to produce these effects? 
If the Ss had been asked to vividly 
imagine or to think about eating cer- 
tain foods (without an “hypnotic in- 
duction”) would they have shown 
similar pancreatic and gastric secre- 
tory activity? That such may have 
been the case is suggested by an 
earlier experiment reported by Luck- 
hardt and Johnston (1924). These 
investigators also found that when a 
hypnotized S was given suggestions 
of eating a fictitious meal, he showed 
an increase in the volume and acidity 
of the digestive secretions compa- 
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rable to that found when actually eat- 
ing a meal; however, in the control 
experiment, when the investigators 
merely talked to the S about an 
appetizing meal, he showed similar 
gastric secretory activity. This find- 
ing is not unusual. Miller, Bergeim, 
Rehfuss, and Hawk (1920) reported 
that the sound and thought alone of 
a frying steak gave rise to gastric 
secretory activity in some normal Ss. 
Employing a S with a gastric fistula, 
Wolf and Wolff (1947) demonstrated 
that during the “‘mere discussion” of 
eating a certain food the output of 
hydrochloric acid from the parietal 
cells was essentially the same as 
when actually ingesting this food. 
Similar effects have been demon- 
strated in other parts of the gastro- 
intestinal tract. Bykov (1957) found 
that in patients with a gall bladder 
fistula (but otherwise physiologically 
normal) ‘‘the sight of and even the 
mere mention of food evoked con- 
traction of the gall bladder” (p. 119). 
The same investigator also studied a 
patient with a fistula of the pancre- 
atic duct but otherwise healthy and 
with a normal digestive tract; 1 or 2 
minutes after being drawn into con- 
versation about savory foods, this pa- 
tient (who was kept on a special diet 
which served to inhibit secretions) 
“showed against this inhibitory back- 
ground abundant pancreatic secre- 
tions.” (The above patients had not 
Participated in experimental condi- 
tioning procedures.) 

Scantlebury and Patterson (1940) 
demonstrated that suggestions of eat- 
ing a fictitious meal were effective in 
inducing a temporary and at times a 
complete cessation of gastric hunger 
contractions in a hypnotized S. Lewis 
and Sarbin (1943) repeated this ex- 
periment, employing the Carlson 
balloon-manometer method with 
eight Ss who had fasted prior to the 
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experiment. The Ss were first given 
the Friedlander-Sarbin hypnotic in- 
duction procedure and rated on 
“depth of hypnosis." Whenever the 
Ss showed gastric hunger contrac- 
tions, they were given the suggestion 
of eating a meal. The kymographic 
tracings showed that the suggestions 
were effective in inhibiting the hun- 
ger contractions in the majority of 
trials with the “deeply hypnotized” 
Ss, in some of the trials with the 
“moderately hypnotized” Ss, and in 
none of the trials with Ss who were 
“slightly hypnotized” or not hypno- 
tized. However, a comparable inhibi- 
tion of hunger contractions could be 
demonstrated in the “deeply hypno- 
tized” Ss by asking them to solve an 
arithmetic problem silently. No 
attempt was made to determine if 
hunger contractions could be inhib- 
ited in unhypnotized persons by ask- 
ing them to “vividly imagine” eating 
a delicious meal. 

Earlier studies which did not em- 
ploy hypnotic procedures found com- 
parable effects. For example, Carl- 
son (1916, p. 152) found that after 4 
days of fasting the sight and smell 
of food inhibited his hunger contrac- 
tions. Since acid in contact with the 
gastric mucosa apparently acts re- 
flexly to produce inhibition of gastric 
contractions (Carlson, 1916, pp. 175- 
176) and since the “mere thought” 
of appetizing food gives rise to a sig- 
nificant amount of hydrochloric acid 
secretion in some normal persons 
(Miller et al., 1920), it can be hy- 
pothesized that suggestions of eat- 
ing a meal are effective in some “hyp- 
notized” Ss and some unhypnotized 
Ss in inducing gastric acid secretions 
which act reflexly to inhibit the 
hunger movements. 

In summary, the above studies on 
metabolic and gastrointestinal func- 
tions appear to indicate that blood 
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sugar levels can be altered in diabetic 
patients by hypnotic or nonhypnotic 
procedures which alter the level of 
“arousal,” and gastric and pancreatic 
secretions and gastric hunger con- 
tractions can be influenced by sym- 
bolic stimulation in both hypnotized 
and unhypnotized persons. 


Errect or Hyrnotic STIMULATION 
on CUTANEOUS Functions 


Production of Herpetic Blisters (Cold 
Sores) by Hypnotic Stimulation 
Ullman (1947) reported that a pa- 
tient (who had been previously cured 
of hysterical blindness) showed mul- 
tiple herpetic blisters on the lower 
lip 25 hours after it was suggested to 
him “while in hypnotic trance” that 
he a rundown and debili- 
tated, he felt as if he were catching 
cold, and fever blisters were forming 
on his lower lip. Heilig and Hoff 
(1928) had previously demonstrated 
a similar effect in an experiment with 
three “neurotic” women. Their pro- 
cedure was as follows: After a formal 
hypnotic induction, an intense emo- 
tional reaction was elicited from each 
S by suggesting an extremely un- 
pleasant experience related to her 
previous life history. During the 
excitement, the experimenter stroked 
the S's lower lip and suggested a feel- 
ing of itch such as she had experi- 
enced previously when a cold sore 
was forming. Within 48 hours after 
the termination of the experiment 
small blisters had appeared on the 
lower lip of each S. This report also 
includes the following data: at least 
two of the Ss had a history of recur- 
rent herpes labialis following emo- 
tional arousal; determination of the 
opsonic index before and after the 
hypnotic experiment indicated that 
the Ss’ physiological resistance was 
reduced after the experiment; her- 
petic blisters could not be induced 
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when the hypnotized Ss were given 
direct suggestions that such blisters 
were forming without at the same 
time eliciting an emotional reaction, 

The above studies can be placed 
in broader context by noting the fol 
lowing: (a) The herpes simplex virus 
appears to be ubiquitous and ready 
to produce illness whenever the nofs 
mal balance between it and the host 
is disturbed not only by fever, aller- 
gic reactions, sunburn, and so forth, 
but also by emotional stress and by 
symbolic stimulation which has sig- 
nificance for the person (Sulzberger 
& Zardens, 1948). (b) Some persons 
show recurrent attacks of herpes 
simplex in the same localized area 
(Veress, 1936); in some cases the at- 
tacks appear to be closely related to 
“emotional conflicts” or to stimula- 
tion which tends to elevate the level 
of “arousal” (Blank & Brody, 1950; 
Schneck, 1947). These findings sug- 
gest that an “hypnotic induction” 
procedure and specific suggestions of 
blister formation may not be neces- 
sary to induce herpetic blisters in 
appropriately predisposed persons. 
An experiment along the following 
lines is indicated: An experimental 
group consisting of persons with a 
history of herpes labialis should be 
given appropriate stimulation to in- 
duce emotional arousal without an 
hypnotic procedure. A second experi- 
mental group consisting of persons 
who do not have a history of herpes 
should be placed in “deep hypnosis” 
and given specific suggestions of cold 
sore formation. It can be hypothe- 
sized that some of the unhypnotized 
Ss in the first group will show her- 
Petic blisters within a day or so after 
the experiment. It would be of in- 
terest to determine if any of the 
“deeply hypnotized” Ss in the sec- 
ond group will show cold sores after 
the experiment. 
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Induction of Localized (Nonherpetic) 
Blisters by Hypnotic Stimulation 


Pattie (1941) has reviewed 11 ex- 
periments which ostensibly demon- 
strate that localized blisters (exclud- 
ing cold sores) can be evoked by 
direct suggestions given to somnam- 
bulistic hypnotic Ss. A relatively 
well controlled experiment reported 
by Hadfield (1917) can be taken as 
the prototype of these investigations: 
After the S was hypnotized, an as- 
sistant touched his arm while Had- 
field gave continuous suggestions 
that a red-hot iron was being applied 
and that a blister would form in the 
burned area. The arm was then 
bound in a sealed bandage and the S 
was watched continuously during the 
following 24 hours. At the end of this 
period the bandage was opened in the 
presence of three physicians and, on 
the designated area, the beginning of 
a blister was noted which gradually 
developed during the day to form a 
large bleb surrounded by an area of 
inflammation, Although the other 
experiments followed this general 
pattern, there are numerous varia- 
tions: in some instances, the experi- 
menter stated that a blister would 
form after a definite time interval 
and in other instances no time was 
specified; some Ss were instructed to 
awaken immediately after the sug- 
gestion of bulla formation and others 
were not given such instructions 
until it was determined if the blister 
had formed; although in most in- 
stances the blister formed in the area 
specified, in at least two instances 
(Jendrassik, 1888; Smirnoff, 1912) 
the bleb formed in another body area. 
Also, in at least two experiments (Ry- 
balkin, 1890; von Krafft-Ebing, 
1889, pp. 26-27, 58-59) the controls 
Were not satisfactory; the Ss were not 
Observed during the intervening pe- 


riod and it is possible that they may 
have deliberately injured the area. 
Two additional cases have been re- 
ported since the publication of Pat- 
tie’s (1941) review, Uliman's (1947) 
S, mentioned in the preceding sec- 
tion of the present paper, had previ- 
ously been cured of hysterical blind- 
ness and had previously shown her- 
petic blisters after hypnotic stimula- 
tion. In an additional hypnotic ses- 
sion, the S was induced to recall the 
battle in which he had recently par- 
ticipated and was given the sugges 
tion that a small particle of molten 
shell fragment had glanced off the 
dorsum of his hand. At this point in 
the procedure, the experimenter 
brushed the hand with a small flat 
file to add emphasis to the sugges 
tion. Pallor followed immediately in 
this circumscribed area approxi- 
mately 1 centimeter in diameter; 
after 20 minutes a narrow red mar- 
gin was evident about the area of 
pallor and after 1 hour the beginning 
of a blister was noticeable. The S 
was then dismissed and returned ap- 
proximately 4 hours later; at this 
time a bleb about 1 centimeter in 
diameter was evident. (The S was 
not observed during the intervening 
period.) More recently, Borelli and 
and Geertz (Borelli, 1953) succeeded 
in inducing dermatological altera- 
tions which superficially resembled 
blister formation in a 27-year-old pa- 
tient with “neurodermatitis.” Dur- 
ing “deep hypnosis” a coin was 
plasd on the normal skin of the hand 
and it was suggested that a blister 
would form within a day at the spot 
were the fictitious burn was occur- 
ring. The next day the patient showed 
a sharply circumscribed and elevated 
area at the designated spot which 
superficially resembled a blister but 
could be more appropriately de- 
scribed as white dermographism. 
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With few if any exceptions in- 
vestigators reporting positive results 
emphasize that they selected som- 
nambulistic hypnotic Ss for their ex- 
periments; however, a number of 
workers using similar procedures 
with somnambulistic Ss have re- 
ported negative results in all cases 
(Sarbin, 1956; Wells, 1944), or have 
reported negative results with the 
majority of such Ss and positive re- 
sults only in rare cases (Hadfield, 
1920). These negative findings ap- 
pear to indicate that appropriate sug- 
gestions given to “deeply hypno- 
tized” persons may be necessary but 
by no means sufficient conditions for 
this phenomenon. 

An additional factor which appears 
necessary is indicated by the follow- 
ing. The 13 persons who gave os- 
tensibly positive results were not a 
cross section of the normal popula- 
tion: prior to the experiment, one had 
been cured of hysterical blindness 
and one had been cured of hysterical 
aphonia; during the time of the ex- 
periment, six were diagnosed as hys- 
terical and one was being treated for 
“shell-shock.” At least five of these 
Ss had histories of localized skin reac- 
tions: one had “a delicate skin” and 
showed labile vasomotor reactions 
(Doswald & Kreibich, 1906, Case 1), 
a second had suffered from “neurotic 
skin gangrene” and had a history of 
wheals following emotional arousal 
(Doswald & Kreibich, 1906, Case 2), 
a third had “a delicate skin” plus 
“dermographia of medium grade” 
(Heller & Schultz, 1909), a fourth 
had suffered from “hysterical ecchy- 
moses” (Schindler, 1927), and a fifth 
was afflicted with atopic dermatitis 
(Borelli, 1953). This suggests that 
the induction of localized blisters by 

hypnotic stimulation may be possible 
only in a small group of persons with 
a unique physiological predisposition. 
What is the nature of this “predis- 
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position”? The data summarized be- 
low suggest a tentative answer. 

Blister formation and wheal for- 
mation apparently involve similar 
physiological and biochemical proc- 
esses: the circular wheal of urticaria, 
the linear wheals of dermographism, 
and the blister resulting from a burn 
can be viewed as variations of the 
“triple response” of the skin to in- 
jury, consisting of the release of 
histamine or a histamine-like sub- 
stance such as 5-hydroxytryptamine 
(serotonin) from the Mast cells, a 
local dilation of the minute vessels, 
an increase in permeability of the ves- 
sels, and a widespread arteriolar dila- 
tion (Lewis, 1927; Nilzén, 1947). 
Nearly every type of stimulus that 
produces whealing when applied to 
the skin will lead to blistering if ren- 
dered more intense, and blister for- 
mation appears to differ from wheal 
formation primarily in that the in- 
creased permeability of the vessel 
walls is of greater degree, the tran- 
suded fluid typically forms a pool in 
the superficial layers of the skin, and 
the epidermal layers are gradually 
forced asunder (Lewis, 1927). This 
close relationship between wheals 
and blisters appears to be significant 
because of the following: 

1. In at least two of the ‘‘success- 
ful” hypnotic experiments (Borelli, 
1953; Doswald & Kreibich, 1906, 
Case 2) the dermatological changes 
induced were much more similar to 
wheals that to blisters. 

2. A critical reading of the other 
reports suggests that the histological 
findings were rarely so clear-cut as to 
definitely conclude that blisters and 
not wheals were produced. 

3. Some unhypnotized persons 
show localized wheals when recalling 
former experiences in which such 
dermatological effects occurred. 

4. Some unhypnotized persons 
show localized wheals after mild me- 
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chanical stimulation. 

Moody (1946, 1948) has presented 
two case studies of patients who de- 
veloped localized wheals when re- 
calling former experiences in which 
wheals occurred. The first patient 
had been previously hospitalized for 
sleepwalking with aggressive behav- 
ior, On one occasion, during this 
earlier hospitalization, the patient's 
hands had been tied behind his back 
during sleep and wheals had formed 
in the traumatized area, At a later 
time, when recalling this experience 
after hexobarbital administration, 
wheals appeared on both forearms in 
the area which had previously been 
tied. On at least 30 occasions when 
recalling earlier experiences of physi- 
cal injury, the second patient (who 
was being treated for “nervous break- 
down”) showed swelling, bruising, 
and bleeding in the body parts were 
the original injury presumably oc- 
curred ; for instance, when remember- 
ing a former occasion when she had 
been struck across the dorsum of both 
hands with a cutting whip, the pa- 
tient showed wheals on both hands in 
the respective areas. Along similar 
lines, Graff and Wallerstein (1954) 
reported that during a therapeutic 
interview a 27-year-old sailor, who 
had a tattoo of a dagger on his arm, 
suddenly showed a wheal reaction 
sharply limited to the outline of the 
_ dagger. The wheal subsided after 
this session but reappeared again in 
the same way during a subsequent 
interview. The authors interpret the 
Patient's free associations as indicat- 
ing that the wheal had symbolic sig- 
nificance for the patient. Brandt 
(1950) has reported similar cases of 
Patients showing sharply localized 
Wheal reactions which appeared to 
be closely related to symbolic stimu- 
lation, 

Dermographism (that is, wheal 
formation in response to a single mod- 
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erately strong stroking of the skin) is 
not as uncommon as is generally as- 
sumed. Testing 84 apparently nor- 
mal young men, Lewis (1927) found 
a detectable swelling of the skin as a 
reaction to a single firm stroke in 
25%; in 5% a full wheal developed. 
Some persons also show wheal forma- 
tion at sites of mild pressure stimula- 
tion such as around a wristwatch 
strap, a belt, or a collar. Graham 
and Wolf (1950) reported an experi- 
mental study of 30 such persons who 
had a history of urticaria and showed 
“spontaneous” wheals in areas of 
mild pressure. All of these Ss also 
showed dermographism although in 
some this was not apparent until 
stressful interviews had altered the 
condition of the skin vessels, Skin 
temperature measurements and in- 
direct measurements of the state of 
the minute vessels (reactive hy- 
peremia threshold) indicated that the 
Ss were prone to respond with vaso- 
dilation of both arterioles and minute 
vessels to numerous stimuli. Since 
in all but one of the successful hyp- 
notic experiments tactual stimula- 
tion was employed to localize the 
pseudotrauma and since in many of 
the experiments the stimulus object 
was a small piece of metal and was 
either allowed to remain in contact 
with the skin or was replaced by a 
bandage, it appears plausible, as 
Weitzenhoffer (1953, p. 144) has sug- 
gested, that similar physiological 
mechanisms may be responsible for 
the above types of urticaria factitia 
and for at least some cases of the 
hypnotic production of localized 
“blisters.” % 
The above data suggest an experi- 
ment as follows: Persons who show 
gross vasomotor alterations during 
seemingly slight changes in the stim- 
ulating situation or who show der- 
mographism under normal conditions 
or during stress should be given the 
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following instructions without a pre- 
liminary “hypnotic induction” — 
“Try to visualize a blister (in a speci- 
fied area) and tell yourself repeatedly 
that such a blister is forming.” If the 
Ss are adequately motivated to com- 
ply with these odd instructions, it 
can be hypothesized that some will 
show dermatological changes related 
to vesiculation. A second experi- 
mental group consisting of persons 
who do not show signs of vasomotor 
lability should be given suggestions 
of blister formation after an ‘“‘hyp- 
notic induction” procedure and when 
they appear to be in “the trance 
state.” It would be of interest to de- 
termine if these “hypnotized” Ss will 
show any cutaneous reactions which 
are involved in the formation of a 
blister. 


Cure of Warts by “Hypnosis” 


Since the genesis of warts appears 
to be causally related to virus ac- 
tivity and since present day methods 
of treating warts are “roundabout 
and nonspecific” (Pillsbury, Shelley, 
& Kligman, 1956, p. 690), recent re- 
ports indicating that appropriate sug- 
gestions given to a hypnotized person 
are singularly effective in curing 
these benign epitheliomas are of 
unique interest. Asher (1956) found 
that suggestions of wart disappear- 
ance given to 25 hypnotizable pa- 
tients resulted, after 4 to 20 treat- 
ments, in a complete cure in 15, a 
marked decrease in the number of 
warts in 4, and no apparent change 
in 6 patients. In these cases the warts 
before treatment varied from 2 to 53 
and were present from 3 months to 6 
years. Eight unhypnotizable pa- 
tients given similar suggestions 
showed no diminution in the number 
of warts; however, in these cases the 
treatment was discontinued after 10 
sessions. In a more extensive investi- 
gation, Ullman and Dodek (1960) 
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attempted to relieve warts by hyp- 
notic suggestions in 62 adults attend- 
ing an outpatient clinic. At weekly 
intervals each patient was given sug- 
gestions of sleep and drowsiness fol- 
lowed by suggestions to determine 
“the depth of hypnosis”; when the 
patient was judged to be at “the pe- 
riod of maximum hypnotic effect,” 
he was told that the warts would be- 
gin to disappear. Of the 47 patients 
rated as ‘“‘poor hypnotic subjects,” 
only 2 showed wart regression within 
a 4-week period. However, 6 of the 
15 patients rated as “good hypnotic 
subjects” had been cured of multiple 
common warts (or, in one case, of a 
single common wart) within 2 weeks 
following the initiation of treatment; 
within a 4-week period, 8 of the 15 
showed wart involution. In these 
successful cases the mean duration 
of the warts prior to treatment was 
19 months with a range of 3 weeks to 
6 years. 

The above investigations are open 
to the criticism that the warts may 
have shown spontaneous involution 
within the same period of time if no 
hypnotic treatment had been given. 
A recent study, however, appears to 
have satisfactorily controlled this 
factor. After an “hypnotic induc- 
tion” consisting of eye fixation and 
suggestions of relaxation, Sinclair- 
Gieben and Chalmers (1959) sug- 
gested to 14 patients (with common 
warts present bilaterally for at least 
6 months) that the warts on one side 
of the body would disappear. Ten of 
the 14 patients showed “adequate 
depth of hypnosis” as indicated by 
compliance with a simple posthyp- 
notic suggestion and by partial or 
complete amnesia. Within 5 weeks 
to 3 months, 9 of these 10 “hypnotiz- 
able” patients showed wart involu- 
tion on the “treated” side while the 
warts on the “control” side remained 
unchanged. (In one patient the ‘“‘un- 
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treated” side showed wart regression 
6 weeks after the “treated” side had 
been cured.) No benefit was observed 
from this treatment in the four pa- 
tients who were not able to attain 
“adequate hypnotic depth.” 
Although the above studies indi- 
cate that symbolic stimulation is 
effective in inducing wart involution 
in some Ss who are able to attain “a 
deep hypnotic state,” equally suc- 
cessful results have been reported for 
a variety of suggestive procedures 
which do not involve an “hypnotic 
induction” or “the trance state.” 
Grumach (1927) found that 16 of 
18 patients with longstanding warts 
showed complete regression of these 
structures within 1 to 4 months after 
being given, at intervals of 8 to 14 
days, an intramuscular placebo in- 
jection (normal saline) in the upper 
arm while, at the same time, being 
told that they were receiving a new 
and powerful wart remedy. Alling- 
ton (1934) followed-up 84 patients 
with longstanding warts treated with 
an intragluteal placebo injection (dis- 
tilled water); 35 (or 41.7%) were re- 
lieved of plane warts or common 
warts after only one injection, 4 were 
cured after two injections, and 1 
after three injections. Bloch (1927) 
reported comparable results with a 
somewhat different procedure. The 
patient was blindfolded and his hand 
was placed on a table containing an 
electric apparatus; although the elec- 
tricity was started no current reached 
the patient. The warts were then 
Painted with an innocuous dye, the 
blindfold was removed, and the pa- 
tient, now confronted with the luridly 
colored warts, was told that the warts 
were dead and must not be washed 
until they had disappeared. Of 179 
Patients thus treated and adequately 
followed-up, 55 (or 30.7%) showed 
Wart involution after the first session 
and an additional 43 patients (or 


409 


24%) showed wart involution after 
additional session extending over a 
period varying from 1 week to 3 
months. Using similar procedures, 
Bonjour (1929), Sulzberger and Wolf 
(1934), and Vollmer (1946) reported 
success in a comparable percentage 
of cases with warts of from 2 to 6 
years duration. In general, these 
suggestive procedures were more ef- 
fective when the patient showed 
multiple warts rather than a single 
wart and when the warts were of the 
juvenile type rather than the com- 
mon type; this type of treatment also 
tended to be more successful with re- 
cent lesions and with younger pa- 
tients. 

Would a similar percentage of pa- 
tients have shown spontaneous re- 
mission of warts if they had not been 
“treated” in the specified period of 
time involved in the above experi- 
ments? Memmescheimer and Eisen- 
lohr (1931) matched 70 patients 
treated by a suggestive procedure— 
painting the warts with methylene 
blue and suggesting their disappear- 
ance—with 70 patients with similar 
warts of similar duration not given 
any treatment. The results were as 
follows: at the end of 1 month, 11 of 
the treated patients showed wart 
resolution as compared to only 2 of 
the patients in the control group; at 
the end of 3 months, 14 of the treated 
patients were cured as compared to 
only 5 of the untreated; however, at 
the end of 6 months, 20 patients in 
the control group showed wart invo- 
lution as compared to only 17 pa- 
tients in the treated group. The con- 
clusion suggested by this study, 
namely, that suggestive treatment 
may accelerate a spontaneous physi- 
ological process leading to wart invo- 
lution, is supported by additional in- 
vestigations summarized below. 

Similar physiological processes 
have been demonstrated when warts 
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heal spontaneously and when they 
are cured in apparent response to 
symbolic stimulation. Unna (quoted 
by Samek, 1931) observed histologi- 
cally that during spontaneous remis- 
sion the normal cutis surrounding the 
wart showed a distinctive reaction 
consisting of hyperemia and cell 
proliferation. Other workers (Alling- 
ton, 1952; Biberstein, 1944; Sulz- 
berger & Wolf, 1934; Vollmer, 1946) 
have also noted a distinct inflamma- 
tory reaction immediately before 
spontaneous healing or before wart 
disappearance in apparent response 
to suggestion or to chemical treat- 
ment. In histological studies of warts 
undergoing involution in a patient 
treated by a suggestive procedure, 
Samek (1931) demonstrated a spe- 
cific inflammatory reaction in the 
dermis consisting of dilation of blood 
vessels, hyperemia, edema, and peri- 
vascular infiltration of leucocytes 
(especially lymphocytes). Concomi- 
tant with this inflammatory reaction, 
mitoses became less frequent in the 
germinative epidermis (stratum mu- 
cosum); with mitoses almost at a 
standstill, the prickle-cell layer be- 
came thin, a normal stratum granulo- 
sum reformed, and the degenerated 
cells flaked off. 

After a careful review of the above 
and related studies, Allington (1952) 
concluded that “at times the balance 
between susceptibility and immunity 
in warts must be a delicate one [and] 
only a slight shift may be needed to 
cause their disappearance.” Vollmer 
(1946) had similarly concluded from 
an earlier review that a labile equi- 
librium must exist between the 
physiological processes which main- 
tain the wart and those which cause 
wart involution and that appropriate 
verbal stimulation may alter the 
equilibrium in the direction of wart 
resolution by causing hyperemia in 
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the surrounding tissue. A number of 
earlier workers (Sulzberger & Wolf, 
1934; Zwick, 1932) had also pointed 
to vasomotor changes as crucial fac- 
tors in wart remission and, more re- 
cently, Ullman (1959) presented data 
indicating that when warts are treat- 
ed by suggestion an “affective re- 
sponse” is induced in the patient and 
the mechanism of healing may be de- 
pendent on local vascular alterations 
which accompany the emotional re- 
action. Since a number of investiga- 
tions reviewed in an earlier section 
of the present paper suggest that 
localized vasodilation and localized 
vasoconstriction can be induced in 
Some individuals by symbolic stimu- 
lation—e.g., by asking the individual 
to recall former experiences in which 
such vasomotor alterations occurred 
(Menzies, 1941)—further investiga- 
tions are required to determine the 
following: (a) Are local vasomotor 
changes consistently present when 
wart resolution is occurring after sug- 
gestive treatment? (b) If so, do such 
vasomotor effects accelerate a nat- 
ural physiological process of wart re- 
mission? (c) Is treatment of warts 
by suggestive procedures relatively 
more successful in persons who show 
vasomotor lability, that is, in persons 
who respond with a greater than 
average degree of vasodilation or 
vasoconstriction to symbolic stimu- 
lation or to emotion-inducing stimu- 
lation? 


THE PHYSIOLOGICAL CORRELATES 
OF “THE Hypnotic STATE” 


The studies reviewed above sug- 
gest the general conclusion that many 
if not all of the physiological effects 
which can be induced in some Ss dur- 
ing “hypnosis” can also be induced 
in some persons without hypnosis. 
The experiments reviewed below sug- 
gest that it is difficult if not impossi- 
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ble to find a physiological index 
which differentiates “the hypnotic 
state" from “the normal waking 
state.” 

During recent years an extensive 
number of experiments have been de- 
signed to determine if “hypnosis” is 
characterized by an elevated or de- 
pressed metabolic rate, heart rate, 
blood pressure, skin conductance, 
respiratory rate, digital blood vol- 
ume, etc. All of these investigations 
lead to a similar conclusion: Physio- 
logical functions vary in the same 
way during “hypnosis” as they do 
during “waking” behavior. Taking 
energy expenditure as the example, 
the evidence indicates that metabolit 
rate may be elevated, may be de- 
pressed, or may not be significantly 
altered during “the hypnotic state”: 
Grafe and Mayer (1923) found that 
hypnotized Ss tended to show an ele- 
vated metabolic rate; von Eiff (1950) 
found that 16 Ss showed an average 
depression of 7% in “basal” meta- 
bolic rate during hypnosis; and 
Whitehorn, Lundholm, Fox, and 
Benedict (1932) reported that oxygen 
consumption was not significantly 
affected by hypnosis. Since the meta- 
bolic rate is elevated during “emo- 
tional arousal” and is depressed dur- 
ing relaxation and sleep (Best & Tay- 
lor, 1950, p. 622), these results are 
only superficially contradictory: Ex- 
Perimenters finding that “hypnosis” 
depresses metabolism (von Eiff, 1950) 
had instructed their Ss to become re- 
laxed, drowsy, and sleepy and had 
not giyen additional suggestions that 
could lead to arousal; investigators 
reporting that “the hypnotic state” 
does not affect metabolism (White- 
horn et al., 1932) had trained their 
Ss over a period of days to insure 
maximal relaxation when the meta- 
bolic rate was determined during the 
control experiment; experimenters 
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finding that heat production was 
elevated during “hypnosis” (Grafe & 
Mayer, 1923) had activated the Ss 
by suggesting various emotional ex- 
periences. 

Investigations designed to deter- 
mine if “hypnosis” is characterized 
by an elevated or depressed level of 
skin conductance have produced 
comparable results; during “the hyp- 
notic trance” Ss may show an eleva- 
tion, a slight decrease, or no signifi- 
cant change in palmar conductance 
(Barber & Coules, 1959; Davis & 
Kantor, 1935; Estabrooks, 1930; 
Levine, 1930). Since an elevated 
conductance level generally indicates 
an elevated “activation” level and a 
low level of conductance generall 
indicates a low level of “ fe 
(Duffy, 1957; Woodworth & Schlos- 
berg, 1954), these results are in agree- 
ment as follows: (a) Hypnotic Ss 
show an elevated level of palmar con- 
ductance when they carry out sug- 
gestions which involve effort or 
activity (Barber & Coules, 1959; 
Davis & Kantor, 1935). (6) When 
given suggestions of relaxation and 
drowsiness, Ss participating in hyp- 
notic experiments may show a de- 
crease or no significant change in 
palmar conductance; if the S accepts 
the suggestions literally and relaxes, 
he shows a fall in conductance (Davis 
& Kantor, 1935; Estabrooks, 1930; 
Levine, 1930); if the S is aware that 
suggestions of drowsiness and relaxa- 
tion are not meant to be taken liter- 
ally, i.e., if he has learned from previ- 
ous participation in hypnotic experi- 
ments that to carry out subsequent 
suggestions properly he must remain 
alert, he generally shows no signifi- 
cant change in conductance (Barber 
& Coules, 1959). Investigations 
along similar lines which support the 
general conclusion that the “hypno- 
tized” person does not differ signifi- 


412 


cantly from the normal person in 
heart rate, respiratory rate, blood 
pressure, digital blood volume, etc. 
have been reviewed by Gorton (1949) 
Weitzenhoffer (1953), Sarbin (1956), 
and Crasilneck and Hall (1959). 
Some years ago it seemed that the 
electroencephalograph would prove 
to be a valuable tool for determining 
when a person was or was not “hyp- 
notized.’’ This hope has not been 
realized. Extensive work in this area, 
reviewed by Weitzenhoffer (1953) 
and Chertok and Kramarz (1959), 
has demonstrated that in the great 
majority of instances the hypnotized 
person continues to show his char- 
acteristic waking pattern on the 
EEG. However, if the operator 
makes it clear to the S that he should 
actually sleep—for example, by not 
disturbing the S after instructing 
him to sleep—some Ss participating 
in hypnotic experiments show delta 
activity on the EEG, indicating that 
they have literally gone to sleep 
(Barker & Burgwin, 1948; Schwarz 
et al., 1955), and others show “‘pe- 
riods of brief flattening out of the 
record ... sometimes accompanied 
by infrequent isolated theta 
rhythms,” indicating that they have 
gone into a light sleep (Chertok & 
Kramarz, 1959, p. 233). However, 
when the S is once more stimulated 
verbally by the hypnotist, he again 
shows his characteristic waking pat- 
tern on the EEG. In brief, studies 
employing the EEG indicate that the 
“hypnotized” person remains nor- 
mally awake until it is made clear 
that he should literally go to sleep 
and is then permitted to sleep. 
Within recent years, Lovett Doust 
(1953) and Ravitz (1951, 1959) have 
proposed two additional physiological 
indices of “the hypnotic state.” Em- 
ploying three hysterics and one psy- 
chopath as Ss, Lovett Doust found 
that ‘‘the induction of hypnosis” was 
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consistently accompanied by a sig- 
nificant fall in arterial oxygen satura- 
tion levels as measured by discon- 
tinuous spectroscopic oximetry at 
the fingernail fold. However, the 
term “induction of hypnosis,’’ as 
used in this report, does not imply 
that the Ss carried out one or more 
of the classical hypnotic behaviors, 
e.g., limb rigidities, negative or posi- 
tive hallucinations; on the contrary, 
by this term the author refers to no 
more than the following: After being 
given suggestions of drowsiness, leth- 
argy, and sleep, the Ss appeared 
passive and lethargic. Since a person 
who appears passive and lethargic is 
not necessarily “hypnotized” (that 
is, does not necessarily carry out any 
of the classical hypnotic behaviors) 
and since a person who carries out all 
of the classical hypnotic behaviors 
does not necessarily appear drowsy or 
passive (Barber, 1960b; Barber & 
Coules, 1959; Wells, 1924), Lovett 
Doust’s findings are open to the fol- 
lowing interpretation: A relative 
anoxemia is found during drowsiness 
or passivity and is not necessarily 
found when a person is “in the hyp- 
notic state,” i.e. when he carries 
out the classical hypnotic behaviors. 
Supporting evidence for this inter- 
pretation is presented in a previous 
study by the same investigator 
(Lovett Doust & Schneider, 1952) 
which demonstrated a similar fall in 
oximetric values during sleep. 
Measuring standing potentials be- 
tween the forehead and the palm of 
the hand, Ravitz (1951, 1959) found 
that the “hypnotic induction” pro- 
cedure was accompanied by either a 
gradual increase or decrease in mean 
potential and ‘‘the trance state itself, 
following induction” was typically 
characterized by a voltage decrease 
and by an increased regularity of the 
direct current (DC) tracings. How- 
ever, additional data presented in the 
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reports suggest that a decrease in 
voltage and an increased regularity of 
the DC tracings may be present 
whenever a person is relaxed and 
shows a low “arousal” level; for ex- 
ample, Ravitz notes that a decrease 
in voltage and an increase in regu- 
larity of the tracings are found dur- 
ing sleep and that increased voltage 
and decreased regularity are found 
during “changes in energy level,” 
excitability, loquaciousness, grief, 
anxiety, and so forth. Since, as 
pointed out above and as will be dis- 
cussed further below, a S need not 
show relaxation or passivity when he 
carries out the classical hypnotic 
behaviors, no conclusions can be 
deduced from these findings until the 
following hypotheses are experi- 
mentally confirmed or disproved: 
(a) Unhypnotized Ss, i.e., Ss who do 
not carry out such behaviors as limb 
rigidity, negative and positive hal- 
lucinations, age-regression, or am- 
nesia when given appropriate sug- 
gestions, show a relative decrease in 
voltage and an increased regularity 
of the DC tracings when instructed 
to become relaxed and passive. (b) 
If an “hypnotic induction” leading to 
drowsiness and passivity is not em- 
ployed, if, on the contrary, a direct 
suggestive procedure is used as de- 
scribed by Wells (1924) and Barber 
(1960b), ‘deeply hypnotized” Ss, i.e., 
Ss who carry out all of the classical 
hypnotic behaviors, do not show the 
above indices of ‘the trance state.” 

The above investigations and more 
recent speculations concerning the 
neurophysiological correlates of 
“hypnosis” (Arnold, 1959; Roberts, 
1960; West, 1960) appear to be based 
on the following implicit assump- 
tions: (2) When a person carries out 
the type of behavior which has been 
historically associated with the term 
“hypnosis” he is in “an altered state” 
from his normal self, specifically, in 
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“a trance state” or “an hypnotic 
state.” (b) This “altered state” is of 
such a kind as to include a distinct 
and consistent type of physiological 
functioning which is rarely if ever 
present when a person is not carrying 
out hypnotic behavior. Although 
these assumptions are by no means 
limited to recent investigations (they 
are present in many if not all theories 
of “hypnosis” since the days of 
Mesmer), the evidence summarized 
below suggests that they are open to 
question. 


Hypnotic BEHAVIOR WITHOUT 
AN ‘Hypnotic INDUCTION” 


Since Ss participating in hypnotic 
experiments are almost always given 
an “hypnotic induction” consisting 
of suggestions of relaxation, drowsi- 
ness, and sleep, and since such a pro- 
cedure is generally effective in induc- 
ing an appearance of lethargy or 
“trance,” it often seems as if hypnotic 
behavior is a function of, or closely 
related to, “the trance state.” How- 
ever, in a pioneering study, Wells 
(1924) demonstrated that direct 
commands (e.g., ‘Your arm is insen- 
sitive to pain,” “You cannot speak 
your name”), repeated emphatically 
for a few seconds, were sufficient to 
elicit anesthesia, amnesia for name, 
limb rigidity, hallucinatory pain, 
total amnesia, automatic writing, 
and posthypnotic behavior in a large 
proportion of male college students. 
Wells insisted that his Ss did not ap- 
pear relaxed, drowsy, or lethargic 
and that he obtained results more 
quickly and with a larger proportion 
of Ss by such a direct procedure than 
by an “hypnotic induction” designed 
to induce “trance.” 

Recent investigations appear to 
confirm Wells’ results. In one study 
(Barber, 1960b) a female student 
research assistant (untrained as a 
hypnotist) gave 236 students at a 
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girl's college direct suggestions (each 
suggestion requiring either 30 or 45 
seconds) of body immobility—“Your 
body is heavy, rigid, solid; it’s impos- 
sible for you to stand up; try, you 
can't,”—~arm heaviness, arm levita- 
tion, hand rigidity, inability to say 
name, hallucination of thirst, selec- 
tive amnesia, and posthypnotic be- 
havior. Although an “hypnotic 
induction” was not employed, 49 Ss 
(or 20.8%) immediately carried out 
at least six of the eight suggestions 
and a total of 109 Ss (or 46%) car- 
ried out at least half of the sugges- 
tions. The postexperimental reports 
of these Ss were indistinguishable 
from the reports of persons who are 
said to be “hypnotized,” e.g., “I just 
couldn't get up from the chair,” “I 
was amazed when I couldn’t speak 
my name,” “T felt I was dying from 
thirst.” In another study (Barber, 
1960b, 1960c) the results of such a 
direct procedure were compared with 
the results of a formal “hypnotic in- 
duction” procedure. In the first part 
of this experiment 70 attendants, 
nurses, and clerical workers at a state 
hospital (who agreed to participate 
in an experiment on “‘imagination’’) 
were given a series of suggestions 
(each suggestion requiring 30 sec- 
onds) appropriate to induce arm 
rigidity, arm levitation, limb heav- 
iness, limb anesthesia, hallucinations 
of thirst, heat, and cold, eye cat- 
alepsy, and hypnotic dream. Sim- 
ilar results were obtained as in the 
above study: 20 Ss (or 28.6%) im- 
mediately carried out at least seven 
of the nine suggestions and a total of 
34 Ss (or 48.6%) carried out at least 
five of the suggestions. In the second 
part of this experiment the same Ss 
were given an “hypnotic induction” 
procedure (consisting of suggestions 
of relaxation, drowsiness, and sleep) 
and then given the suggestions of arm 
rigidity, arm levitation, limb anes- 
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thesia, etc., as in the preceding ex- 
periment. Although the Ss now ap- 
peared to be “in trance” (and stated, 
after the experiment, that they had 
“felt hypnotized”) a high correlation 
(r=.84) was obtained between scores 
in the two sessions; in general, Ss 
who carried out one or two sugges- 
tions in the first part of the experi- 
ment carried out the same one or two 
suggestions after the “hypnotic in- 
duction” and Ss who responded 
positively to all of the suggestions in 
the second part of the experiment 
had also carried out all of the sug- 
gestions without the “hypnotic in- 
duction.” 

Related to the above are the results 
of other recent investigations (Barber 
1958a; Fisher, 1954) which indicate 
the following: 

1. If Ss participating in “hypnosis” 
experiments show lethargy, drowsi- 
ness, or other signs of “trance,” 
these characteristics can be readily 
removed and the “good” Ss will con- 
tinue to carry out the hypnotic 
performances if instructed: “Be 
perfectly awake. Come out of 
‘trance’ but continue to obey my 
commands.” 

2. Many if not all “good” hypnot- 
ic Ss carry out all suggestions given 
during the posthypnotic period, i.e., 
after they are told to wake up, as 
long as they believe that their rela- 
tionship with the operator remains 
that of subject and hypnotist. 

In brief: Investigations which pro- 
pose to find the physiological corre- 
lates of “hypnosis” uncritically as- 
sume that hypnotic behavior is a 
function of “the trance state”; this 
assumption is open to question. 
Appropriately predisposed persons do 
not need an “hypnotic induction” 
and need not appear to be in “trance” 
to carry out many if not all of the 
behaviors which have been associated 
with the term “hypnosis.” 
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SUMMARY AND CONCLUSIONS 


1. The normal person who is asked 
to ‘concentrate away from” red and 
green responds to the Ishihara in the 
manner characteristic of the hypnotic 
“color-blind” subject. An “hypnotic 
induction” procedure and “the 
trance state” may also be superfluous 
to eliciting the behavior which char- 
acterizes “hypnotic blind"’-or “hyp- 
notic deaf” subjects; the evidence 
reviewed suggests that similar per- 
formances can be induced in normal 
persons by simply instructing them 
to remain inattentive and unrespon- 
sive to visual or auditory stimuli. 

2. A number of physiological ef- 
fects which have been considered as 
peculiar to “the hypnotic state” 
appear to be relatively commonplace 
performances; e.g., although sugges- 
tions of eating a delicious meal are at 
times effective in evoking gastric and 
pancreatic secretions and in inhibit- 
ing gastric hunger contractions in 
some “deeply hypnotized” subjects, 
it is not uncommon for normal per- 
sons to show similar gastrointestinal 
effects when they visualize the inges- 
tion of savory food. 

3. A group of so-called “hypnotic” 
phenomena—production of localized 
blisters, cure of warts, alteration of 
blood glucose levels, production of 


tachycardia or cardiac block—can 
apparently be elicited with or without 
an “hypnotic induction” in a small 
number of individuals who possess a 
specific lability of the physiological 
systems involved. 

4. An extensive series of experi- 
ments has failed to find a physiologi- 
cal index which differentiates “the 
hypnotic state” from “the waking 
state.” 

5. A series of experiments compar- 
ing the results of an “hypnotic induc- 
tion” procedure with the results of a 
direct suggestive procedure indicate 
that appropriately predisposed per- 
sons do not need an “hypnotic induc- 
tion” and need not appear to be in 
“the trance state” to carry out the 
typical behaviors which have been 
associated with the word “hypnosis.” 

6. Further investigations into the 
nature of “hypnosis” might well by- 
pass the concepts of “hypnotic induc- 
tion” and “‘trance state” and focus on 
biographical and situational factors 
which may account for certain indi- 
viduals responding to symbolic stim- 
ulation from another person with so- 
called “hypnotic” behavior, whether 
primarily motor responses (e.g., limb 
rigidity, eye catalepsy) or primarily 
physiological responses (e.g., tachy- 
cardia, wart involution). 
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The data concerning number con- 
cepts of animals which have been 
reported so far do not agree with the 
general relationship between position 
in the phylogenetic scale and be- 
havior. Rensch and Altevogt (1953) 
working with an elephant and Hicks 
(1956) with monkeys had limited 
success in establishing a “threeness” 
concept, while Koehler (1943) and 
his collaborators (Arndt, 1939; 
Braun, 1952; Légler, 1959; Marold, 
1939; Sauter, 1952; Schiemann, 1939) 
obtained a ‘“‘sevenness’’ level on 
several species of birds. 

Salman (1943) reviewed the num- 
ber capacities of animals and found 
inadequate controls which allowed 
operation of rhythmic cues in most 
studies reported prior to 1939. Sal- 
man and other reviewers (Honigman, 
1942; Koehler, 1951; Thorpe, 1956) 
considered rhythmic cues and other 
extraneous variables very well con- 
trolled in the studies reported in 1939 
and thereafter, but did not mention 
the omission of certain operating 
procedures considered standard in 
the United States literature. The 
bird studies, originally reported in 
German, have been uncritically ac- 
cepted by Hicks (1956), Morgan 
(1956), Newman (1956), and other 
American writers. It will therefore 
be the purpose of this paper to reex- 
amine in detail the methodology 
used in studies reported since 1939. 


Definition 


There is considerable agreement 
among most investigators in the defi- 
nition of a number concept. An 
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animal is usually required to solve 
a problem without the aid of im- 
mediate physical variables. Exter- 
nal cues such as size, shape, color, 
brightness, tactile, odor, etc., as well 
as internal ones arising from rhyth- 
mic motor patterns or other visceral 
or kinesthetic feedback should either 
be absent or randomized from trial to 
trial, so that the numerosity of the 
stimulus constitutes the only con- 
stant variable. Experiments which 
were designed to include any of the 
above physical variables on a non- 
randomized basis will be omitted 
from this review. Included, however, 
will be those in which a number con- 
cept is reported by the authors 
though immediate and constant cues 
could have been responsible for the 
observed behavior. 


BIRDS 


Arndt (1939) tested number ability 
with various tasks in nine pigeons 
which received an average of 4,000 
trials. To prevent rhythmic se- 
quences Arndt presented peas on a 
turntable, exposing one pea at a time. 
Delays from pea to pea were from 1 
to 60 seconds. His pigeon‘‘Blaugrau”’ 
mastered the pecking of five peas 
only and would not touch a sixth pea 
when it appeared in the open slot. 
Another pigeon “Grau” learned a 
fourness problem on its first trial. In 
a tube experiment Arndt dropped 
peas from behind a screen at intervals 
varying from 1 to 20 seconds. On the 
animal’s side the pea fell into a cup- 
shaped receptacle at the end of the 
tube. With this method pigeon 


“Braunweiss’ responded correctly 
with 55% in a twoness problem dur- 
‘ing the last hundred of 915 trials. 
When subsequently trained for a 
threeness problem, it responded cor- 
' rectly on the first seven trials, which 
means that without any negative 
transfer from the previous problem it 
picked the now correct third pea. 
Such an initial and highly accurate 
response strongly suggests the pres- 
ence of extraneous cues. Arndt, how- 
ever, looked upon it as “progress in 
learning” not realizing that even the 
most optimal “learning to learn” 
situation requires some negative 
transfer. In another experimental 
arrangement Arndt employed lid- 
covered boxes on a turntable. Again, 
only one box appeared in an open 
slot at any one time. With a twoness 
task one wheat kernel or one pea was 
placed into each of two successively 
appearing boxes. One pigeon “Blau- 
weiss” learned to open these two 
successive boxes, but would not open 
a third box. When the two baits were 
distributed within three boxes, (1, 0, 
1) the second box being empty, 
“Blauweiss’” exhibited immediate 
learning, opening now three boxes, 
and leaving the fourth one untouched. 
From this behavior Arndt concluded 
that the bird had not learned to open 
a certain number of boxes, but 
learned to eat a certain number of 
peas. Gradually, within 6,000 further 
trials it learned to take six peas out of 
six boxes, not opening the seventh 
box. During the above experiments 
Arndt noticed that the birds would 
usually remain at the slot of the ap- 
paratus after they had responded 
correctly, and would turn away only 
after the turntable turned to present 
the negative stimuli. Arndt tested 
the possibility of differential accelera- 
tion as an extraneous cue, which may 
have been possible, since the turn- 
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table was operated manually. He 
asked another experimenter to turn 
the table for 600 trials and observed 
no differential results when compared 
with data from his own manipula- 
tions. He failed to note that the other 
experimenter also knew the correct 
number, and that subjective accelera- 
tion cues may have remained con- 
stant from one experimenter to an- 
other. In his review Thorpe (1956) 
describes Arndt, among other experi- 
menters, as having “adopted quite 
extraordinary precautions to avoid 
errors of the ‘Clever Hans’ type” (p. 
344). But “Clever Hans” could also 
solve problems when given by an- 
other experimenter who knew the 
correct answer. 

Arndt obtained 65% correct re- 
sponses as thehighest levelof perform- 
ance on sets of 100 trials during 
thousands of trials. Such a low level 
of mastery and the frequent absence 
of negative transfer do suggest ex- 
traneous cues with both the tube and 
turntable experiments. Auditory 
variables which could have arisen 
from the experimenter or from a rat- 
tling of baits in the boxes during the 
turning were not controlled. Olfac- 
tion received no attention and boxes 
were not baited beyond the desired 
number. Another extraneous variable 
in Arndt’s methodology could have 
been the nonrandomization of the 
amount of food ingested. Since it can 
be assumed that most of the peas 
were of equal size, visceral feedback 
could have presented a constant and 
immediate stimulus, and the correct 
response could have been based on 
quantity of food rather than on a 
mediated numerical concept. Thus, 
a quantity of food, an odor, a noise, 
or a “subjective” turning speed may 
alone or in combination be responsi- 
ble for the results observed. Arndt’s 
methodology therefore, does not war- 
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rant the conclusion that behavior 
based on numerosity was exhibited. 

Concurrently with Arndt, Marold 
(1939) tested several parakeets on 
simultaneous and successive tasks. 
One parakeet was trained to dis- 
criminate between groups of two and 
three kernels. No learning was ex- 
hibited within 500 trials. The bird, 
apparently, depended too strongly 
on figure aid and changed to a posi- 
tion habit whenever the figure aid 
was withdrawn. To break this posi- 
tion habit Marold allowed the bird 
to eat the negative group of kernels 
after a positive response and ob- 
served positive results on the sixth 
block of 100 trials. The correctness 
level, however, did not rise above 
57% correct within 1,100 trials. On 
the successive task Marold used rows 
of kernels and required her birds to 
eat x kernels without touching the 
x+1 kernel. Marold’s parakeet 
“Grün” was trained to eat two ker- 
nels from a row varying from three to 
seven kernels. The distance between 
the kernels was altered from .5 to 0 
centimeter and with decreasing dis- 
tance, decreasing accuracy was ob- 
served. At the end of 900 trials the 
bird responded 87% correct, but the 
percentage dropped to 44 on a subse- 
quent block of 100 trials which 
involved a further decrease in spac- 
ing. In an additional block behavior 
resembling experimental neurosis was 
reported. 

Throughout her experiments 
Marold reported large individual 
differences, but she concluded that 
these differences arose from individ- 
ual differences in treatment and in 
“Einfühlung.” Such a statement 
suggests that the birds did receive 
differential treatment intentionally 
or unintentionally, which may have 
accounted for some of the results 
observed. Marold’s simultaneous 
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discrimination task was not free from 
differential size cues. Likewise, 
extraneous distance cues were pres- 
ent during her successive task, result- 
ing in experimental neurosis at zero 
distance. Such behavior resembles 
the inability to differentiate between 
cues immediately present in the en- 
vironment. The presence of these 
and other extraneous cues makes it 
difficult to ascribe Marold’s observa- 
tion to numerical behavior alone. 

Schieman (1939) used a new meth- 
od for investigating the ability of 
birds to act successively to numbers. 
He confronted his jackdaws with a 
row of 10 covered dishes and required 
them to uncover their lids in se- 
quence. Baits were differentially 
distributed according to a prear- 
ranged pattern, so that sometimes a 
dish would contain two or more baits 
and sometimes none. His birds had to 
uncover a different quantity of dishes 
to obtain x baits. Odor cues were not 
controlled since the dishes beyond 
the correct number which were not 
to be uncovered did not contain bait. 
One jackdaw “Blau” exhibited its 
upper limit of x=6 and performed 
with 65% correct during 886 trials. 
Another jackdaw learned within 1,000 
trials to differentiate between the 
eating of two, three, four, and five 
baits. 

Schiemann reported that perform- 
ance was lower on a task which 
required the opening of x dishes 
rather than the eating of x baits. 
Schiemann did not vary the amount 
of food per trial, so that on any one 
number task this could have pre- 
sented immediate cues from visceral 
feedback. Such a hypothesis could 
explain the high performance with 
“baits eaten” and the random per- 
formance with “dishes uncovered.” 

Schiemann attempted further a 
combination of successive and simul- 
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taneous number discrimination. He 
presented his jackdaw “Grün” a 
stimulus card containing either two 
or four dots. According to this sam- 
ple-number ‘“‘Grün” was subsequent- 
ly required to peck the indicated 
number of baits from a plate. After 
1,200 trials a 70% correct response 
was obtained in the last 100 block. 
Schiemann believed that this demon- 
strated a success in the ability to act 
out a previously seen number. It 
should, however, be noted that the 
size of the sample dots remained 
constant throughout this task. With 
stimuli differing in one physical 
dimension this task may be compared 
with the disjunctive RT experiments 
so need not be related to number 
concepts. If unknown samples were 
presented, Schiemann states, the 
jackdaws appeared to be “completely 
helpless.” 

Koehler (1943) worked intensively 
with a 9-year-old raven named 
“Jacob” which received a total of 
approximately 12,000 trials during 
794 working hours. On a series of 
trials “Jacob” learned to discriminate 
successfully between piles of baits 
having the following ratios: 4:5; 4:6, 
6:5, and 7:6 (the first number indi- 
cating the positive stimulus). After 
having mastered these tasks it was 
not possible to train “Jacob” to dis- 
criminate on a 5:6 problem, though 
several hundred trials were admin- 
istered with and without punishment 
and interspersed with rest periods. A 
naive bird was likewise unsuccessful. 
Koehler noted in a later film of the 
experiment that his assistant had a 
tendency to place the positive group 
closer to the forward margin of the 
experimental board during the dis- 
crimination series. Koehler did not 
mention whether correction of this 
placement cue preceded the 5:6 
problem, but if it did the failure 
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could be explained. As in the famous 
case of Kinnebrook the assistant was 
replaced but manual placing of baits 
was continued. 

The most difficult task which 
“Jacob” learned was a multiple 
choice task with a sample-indicator. 
A sample card was placed on the 
ground indicating the required num- 
ber by means of irregular dots. 
Around it five covered dishes were 
placed with their lids showing ir- 
regular dots from numbers two to six. 
The dots on the positive lids differed 
in size, shape, and configuration from 
the ones on the sample. ‘Jacob’ was 
able to obtain the reward of either 
grain, fruit, cheese, or meat when all 
aids were withdrawn and when the 
dots were replaced by irregular pieces 
of plasticine. The breaking and 
kneading of the plastic material as 
well as placing it on the lids was done 
manually, and again inadvertent 
cues arising from this manipulation 
should not be excluded in the evalua- 
tion of the obtained results. Odor 
control was likewise seriously lacking 
and initiated only after Tinbergen 
reported to Koehler that a jay was 
able to detect mealworms by odor. 
During the 481 trials which were 
presented with the irregular plasticine 
dots only five were partially odor 
controlled by baiting several dishes. 
“Jacob” responded correctly on all of 
these five trials. 

Braun (1952) worked with three 
parrots to investigate some combina- 
tion tasks. For positive reinforce- 
ment hempseed or cheese was used. 
Negative reinforcement consisted of 
punishment with a stick but was 
applied only when “absolutely neces- 
sary.” At other instances during her 
experiments Braun made loud noises, 
threw a wet sponge, Or pulled tail 
feathers as methods of negative rein- 
forcement. 
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One of Braun’s parrots performed 
on a dish-row problem during a six- 
ness task with an average of 75% 
correct responses, It is difficult, how- 
ever, to evaluate the results and 
ascribe them only to a numerical 
concept since the physical proximity 
which the experimenter must have 
maintained with her animals in order 
to administer the various methods of 
punishment could have presented a 
host of extraneous cues. Furthermore 
variables such as odor and amount of 
food ingested were not controlled. 

Eight magpies were used as subjects 

by Sauter (1952) who repeated some 
of the above tasks. Her food dishes 
were baited and covered in the ob- 
servation room and manually spaced 
10-12 centimeters in the experimental 
room. Her rewards were a variety of 
foods, with type and amount remain- 
ing equal within one task. A scare 
apparatus which was rarely used 
served for negative reinforcement. 
She tested four magpies on a dish- 
row. X baits were distributed in 25 
different ways into 10 dishes. Magpie 
“Prinz” accomplished an x=3 prob- 
lem on this task, but showed no 
transfer effect when a row of ten 
mealworms replaced the row of 10 
dishes. The upper limit, x=7, was 
reported to have been reached with 
magpie “Felix” performing at a level 
of 74% correct at the end of 100 
trials, 

Half of Sauter’s bait distributions 
on the dish-row problem did not 
contain zero spacings and if odor 
cues, which were not controlled, were 
postulated, they could account for 
50% of the correct responses. An 
additional 25% correct responses 
could be assumed if chance behavior 
occurred on 50% of the trials which 
did have zero spacings and therewith 
a possible odor control. Thus, the 
odor variable alone could explain the 
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74% correct level reported for “Felix” 
during the x=7 task. Odor remained 
likewise uncontrolled. in Sauter’s 
simultaneous experiments. Dishes 
were manually placed and the selec- 
tion and production of the irregular 
Pieces of plasticine did not follow a 
prescribed procedure to assure the 
nonoccurrence of the experimenter's 
unintentional cues. 

Braun’s parrot “Jako” was again 
used by Légler (1959) to perform on 
16,076 additional trials on various 
number combinations. One such task 
involved successive presentations of 
flashes of light which varied in num- 
ber and served to indicate the correct 
number of baits to be chosen on a 
dish-row. Numbers up to seven could 
be acted out in this way. Stimulus 
generalization of numerosity was re- 
ported on lower numbers when the 
visual indicators were replaced by 
auditory ones. On a single dish-row 
problem “Jako” was successful in 
obtaining eight baits which were 
differentially distributed into 11 
dishes. The bird reached significant 
results after performing on chance 
level for 600 trials. Though odor was 
not controlled its possible interaction 
on the successive problem is not 
likely since 4/5 of Légler’s bait distri- 
butions contained one or several zero 
spacings. But, again, the manual 
placing of all dishes, the nonrandomi- 
zation of food quantities, and the 
occasional deviations from the in- 
tended methods of scaring could have 
contributed extraneous cues. 

All of the above reported bird 
studies used variable and highly 
subjective types of negative rein- 
forcement. Most experimenters de- 
signed a scare apparatus intended for 
uniform punishment, but abandoned 
it early in their experiments. Arndt 
(1939) changed over to bait with- 
drawal while Marold (1939) used 
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blasts of air to blow away negative 
kernels but allowed them to be eaten 
in other instances. She also reported 
spraying of water and gypsum pow- 
der into the birds’ faces. At times 
Schiemann (1939) ‘‘scared’’ the birds 
but darkened the room at other 
occasions. It is not clear in most of 
the above reported experiments how 
often and in which instances the 
various punishment methods were 
used. Honigman (1942) and Salman 
(1943) who reviewed the above ex- 
periments believed them well con- 
trolled and Thorpe (1956) more re- 
cently termed them ‘‘technically 
beyond reproach” (p. 349) and stated 
that the use of the punishment ap- 
paratus made it impossible for the 
experimenter to give inadvertent 
signs. He does not mention the 
abandonment of the apparatus and 
the substitute method reported by 
the various authors. 

Another serious lack common to 
the above bird studies is the absence 
of odor controls. Only Koehler (1943) 
controlled it partially in 37 trials out 
of an approximate total of 55,000 
trials given by the above experi- 
menters who offered such variable 
baits as flour, bread, seeds, fruits, 
cooked and raw meats, cheese, and 
others. The general assumption 
that birds are insensitive to odor may 
be quite fallacious. In a well con- 
trolled experiment Zahn (1933) found 
odor sensitivity equal and surpassing 
human thresholds on five different 
odors in experiments with pigeons, 
blackbirds, blue titmouse, robins, 
and hedgesparrows. 

Manual placing of turntables, cups, 
lids, baits, dots, or plasticine was 
also present in all of the above bird 
studies. It was reported by Arndt 
(1939) and Koehler (1943) to have 
influenced their results on certain 
occasions. Elimination of these and 
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other cues could have been assured 
only by complete mechanical presen- 
tation of the stimulus components, a 
methodology not adopted by any of 
the experimenters. 


FisH 


The counting capacities of min- 
nows, stickleback, and other small fish 
were investigated by Rossmann 
(1959) who found that innate prefer- 
ences of bait size, stimulus density, 
and motility interfered with nu- 
merosity throughout prolonged train- 
ing on the simultaneous discrimina- 
tion task. One motor act and one 
tonal quantity, however, could even- 
tually be differentiated from two 
acts or two tones on the successive 
task. One minnow, e.g., required a 
sequence of 170 negative reinforce- 
ment trials before it learned to eat 
the first bait without touching the 
second. The experiment was well 
controlled in regard to odor, bait- 
size, and rhythm. Training to num- 
bers above one could not be estab- 
lished and Rossmann concluded that 
a number concept in fish can there- 
fore not be postulated. 


MAMMALS 


Rodents 


Hassmann (1952) experimented 
with 13 squirrels employing the 
methodology used by Koehler and 
his collaborators with the bird sub- 
jects. She used a variety of nuts and 
seeds as reward and scaring with 
a broom as negative reinforcement. 
On the dish-row task “Grauhörn- 
chen” was reported to have demon- 
strated a concept of fiveness and 
“Hans” one of sixness. Hassmann’s 
simultaneous task required the differ- 
entiation between five lid-covered 
dishes each bearing a number from 
three to seven. These numbers were 
indicated by irregular dots that 
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ehanged in size and position from 
trial to trial. One squirrel “Hexer” 
could differentiate the seven-lid from 
the three, four, five, and six-lids. 

There are several observations by 

Hassmann which strongly suggest 
the presence of extraneous cues. A 
new fourness task was solved with an 
initial correctness equal to a previous 
threeness task involving 600 trials. 
Hassmann did not interpret this as a 
possible indicator of extraneous cues, 
but termed this behavior “a surpris- 
ing success in learning.” Odor was 
not controlled since negative dishes 
were not baited during a total of ap- 
proximately 15,000 trials, in spite of 
the fact that Hassmann reported an 
aversion on the part of one of her 
animals from newly painted dishes 
(Hassmann, 1952, p. 299). On one 
occasion an unplanned odor control 
was reported. A squirrel pushed a 
positive lid aside, without “seeing” 
the peanut in it. It went to some 
negative dishes but returned later to 
the positive dish, opened it com- 
pletely and obtained the bait. If 
“Peter” had smelled the peanut, 
Hassmann maintains, it would have 
continued to displace the positive lid 
on its initial attempt. Hassmann did 
not include peanuts in her previous 
list of rewards and it is difficult to 
determine the amount of acquaint- 
ance “Peter” had with this type of 
reward and its odor. The perform- 
ance was very much like that which 
Tinklepaugh (1932) observed in mon- 
keys when rewards were changed 
during a delayed reaction test. 

Aside from odor the manual placing 
of all cups, lids, and sample dots 
could have presented additional ex- 
traneous cues. Hassmann’s meth- 
odology should be scrutinized since 
she reported the successful learning 
of an oddity task in which the cue for 
solution was always numerosity. If 
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this performance occurred without 
the aid of extraneous cues it would 
represent one of the highest concep- 
tual achievements on subprimate 
level. 

Wesley (1959) investigated nu- 
merosity in the rat in a successive 
task in which the animals were re- 
quired to enter a “second” open alley 
without previously entering a “‘first”’ 
open one. The alleys, their location, 
and their total number changed from 
trial to trial. Some significant runs 
were obtained only by massing trials 
at the end of daily practice sessions, 
linked with nonreinforcement after 
an initial prolonged corrective train- 
ing. Osgood! pointed out that it is 
possible the animal responded by 
avoiding the first open, negative door 
rather than by entering the second 
open one. Thus as in the case of 
Rossmann’s fish the rats may have 
responded only to oneness. 

The rats’ capacity to discriminate 
by numerosity was further investi- 
gated by Wesley on a multiple serial 
visual discrimination apparatus. Rats 
were able to perform on a twoness 
task after approximately 100 trials 
and showed negative transfer to a 
subsequent threeness task. Dis- 
crimination of threeness was acquired 
but not maintained after the exclu- 
sion of triangularity. 


Elephant 


The visual learning capacity of an 
elephant was studied by Rensch and 
Altevogt (1953) who presented three- 
and four-dot patterns on stimulus 
cards. After almost 100 trials the 
elephant was able to distinguish cor- 
rectly between irregular dots on a 
3:4 discrimination problem, but only 
with constant arrangement of the 
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stimulus dots. Since the positive 
three-dot pattern was always pre- 
sented in one of seven arrangements 
and the negative four-dot pattern 
in one of five arrangements, the ele- 
phant could have solved the entire 
task by learning five different Gestal- 
ten, and it is therefore questionable 
whether the animal had the abstrac- 
tive capacity the experimenters sug- 
gest. 


Monkeys 


Douglas and Whitty (1941) re- 
viewed the literature of number ap- 
preciation in subhuman primates and 
tested four baboons in a visual dis- 
crimination experiment. They pre- 
sented either one or two successive 
flashes and required a different re- 
sponse to each cue. When subse- 
quently they were equated for dura- 
tion the proportion of correct 
responses fell to a low value. 

Kiihn (1953) investigated the abil- 
ity to differentiate visually between 
black dots of varied sizes and ar- 
rangements. His 2-year-old rhesus 
monkey “Lola” received a total of 
18,718 trials within 439 working 
hours. Kühn used 50 discrimination 
cards per number throughout his 
experiment and presented these on 
training and on test trials. He re- 
ported learning to discriminate num- 
ber on an 8:6 task, but it should be 
noted that the cards of the six series 
were presented 500 times prior to this 
task, always designating the negative 
stimulus. Responses may have oc- 
curred to individual cards and not 
necessarily by means of the number 
concept they presented. A similar 
type of learning may have been in- 
volved in the solution of the 8:7 task. 

Hicks (1956) investigated the num- 
ber concept in eight adolescent 
rhesus monkeys. His methodology 
was free of extraneous cues, since he 
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introduced new and different stimu- 
lus cards during test trials. All of his 
animals performed above chance on a 
threeness problem, though some with 
rather moderate proficiency. Hicks 
compares his results with other stud- 
ies and assumed that the 8:7 dis- 
crimination level observed by Kühn 
represents a true number concept but 
he had some doubt whether his own 
positive results indicated a number 
concept per se, since in all tests of 
number concepts the stimuli possess 
other characteristics than number. 
If, however, such a definition is em- 
ployed no number concept per se 
could ever be demonstrated even on a 
human level, as stimulation always 
involves physical characteristics in 
addition to numbers. Heidbreder 
(1946), e.g., could not present two- 
ness to her human subjects without 
involving objects, gestalten or size. 


CONCLUSIONS 


The performance of birds on a 
sevenness level has been compared 
to the human level of subitizing, an 
estimating of number without count- 
ing, where seven seems to form the 
average upper limit. (Jevons, 1871; 
Miller, 1956.) A re-examination 
of the methodology of these bird 
studies, however, makes such a com- 
parison invalid and questions per- 
formance at any numerical level. 
Phylogenetically, the monkey would 
be expected to perform closer to 
the human level, but at present 
threeness is the only level unequivo- 
cally established with this species. 
The numerical capacity above three- 
ness needs further investigation with 
monkeys, as numerosity in general 
needs to be studied further through- 
out the entire phylogenetic scale. To 
free future experiments from the 
influence of extraneous cues the 
presentation of the stimuli should be 
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mechanical and should randomize 
time, distance, size, and amount of 
food and should control odor, noise, 
and other possible immediate cues. 
It is very likely that the use of rigid 
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The alleged phenomenon of hyp- 
notic age regression refers to the ap- 
parent fact that a subject (S) who is 
told under hypnosis that he is, €g., 
4 years old, may behave in a manner 
which is characteristic either of his 
own behavior at that age, or of chil- 
dren in general at that age. It should 
be noted carefully that the phenom- 
enon may refer to the reactivation of 
behavioral characteristics of S him- 
self; or may refer to a more general 
revivification of childlike behavior. 

According to Platonow (1933), 
hypnotic age regression was first 
demonstrated clinically in 1893 by 
Kraft-Ebing. In spite of a good deal 
of clinical interest in the alleged 
phenomenon, experimental interest 
in the problem remained dormant 
until the publication of Platonow’s 
report, in which he claimed that hyp- 
notic age regression had been objec- 
tively demonstrated in three Ss, us- 
ing the Binet test, and that the gen- 
eral behavior of the Ss in the re- 
gressed state showed characteristic 
childlike features. This conclusion 
was challenged by Young (1940), 
who claimed that Ss were able in the 
waking state to reproduce the test 
and general behavior of a young 
child voluntarily, and with greater 
accuracy than hypnotized Ss who 
had been regressed. Curiously, this 
challenging problem of the authentic- 
ity of hypnotic age regression has 
not received a great deal of attention 
since Sears (1943) first reviewed it 
briefly. A review of the literature 


1 Now at the University of Western Aus- 
tralia. Acknowledgement is due to D. Howie 
for valuable criticisms of this paper. 


suggests, however, that the problem 
is now much more clearly defined 
and a number of facts can be accepted 
as reasonably well-established. The 
principal controversies have centered 
around the disputes as to whether re- 
gression (partial or complete) can be 
demonstrated; if so, whether the re- 
gressed state can be simulated by S$ 
or represents a genuine reactivation 
of previous habit-systems or person- 
ality organizations; and finally, 
whether or not hypnosis is an essen- 
tial part of the process. 

This review will cover the types of 
measures which have been used to 
compare the test performance and 
behavior of the S in the waking state 
with that in the hypnotic state, the 
various conditions under which the 
performance is recorded, the princi- 
pal established results, the main 
theories which attempt to account 
for the phenomenon, and the method- 
ological problems involved. Some 
suggestions for future research will 
be made. 


Types OF COMPARISON 


A distinction may be drawn be- 
tween direct and indirect compari- 
sons? of the waking and 
states; and between the use of meas- 
ures which are susceptible to simula- 
tion to a greater or less degree, and 
those which on the whole are not sus- 
ceptible to simulation.’ 


2 The problem is similar to that faced by 
the clinical psychologist attempting to meas- 
ure deterioration (Yates, 1956). 

3 The distinction is, of course, an arbitrary 
one, but it does help in organizing the field. 
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Direct Comparisons 


A few studies have directly com- 
pared S's present regressed perform- 
ance with his known performance on 
the same measure at a time cor- 
responding to the regressed age level. 

Simulable Measures. Sarbin (1950b) 
compared the performance of 12 Ss 
regressed to the age of 8 years with 
their performance on the same test 
when they were actually 8 years old. 
Similarly, Orne (1951) was able to 
compare the original and regressed 
drawings of one of his Ss. 

Nonsimulable Measures. True 
(1949) regressed his Ss successively to 
ages 10, 7, and 4 years, and asked 
them what day of the week their birth- 
day and Christmas Day fell on in that 
year. Best and Michaels (1954), and 
Reiff and Scheerer (1959) performed 
similar experiments, while the latter 
additionally attempted to reactivate 

information about childhood 
experiences (e.g., names of teachers 
and classmates) which Ss claimed to 
be unable to recall in the waking 
state. 

These studies appear to represent 
the only sources of direct comparison 
thus far made. 


Indirect Comparisons 


In the majority of studies on hyp- 
notic age regression the present re- 
gressed performance of S is compared 
with the average known performance 
of normal Ss of the age to which re- 
gression is induced. 

Simulable Measures. There are 
three sources of evidence here. In 
the field of mental testing, investi- 
gators have used the Binet test 
(Keir, 1945; Platonow, 1933; Spie- 
gel, Shor, & Fishman, 1945), the 
Wechsler-Bellevue Intelligence Scale 
(Kline, 1951), the Otis Performance 
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tests (Kline, 1950), the Word Associ- 
ation Test (Dittborn, 1951; Keir, 
1945), and various kinds of motor 
tasks, such as drawing a man, hand- 
writing, etc. (Orne, 1951; Platonow, 
1933). Second, the general behavior 
of S in the regressed state has been 
compared with that of normal chil- 
dren of that age-level (Keir, 1945; 
McCranie, Crasilneck, & Teter, 1955; 
Orne, 1951; Platonow, 1933). Third, 
S has been placed (in the regressed 
state) in situations known to evoke 
intense fear responses in many young 
children and his behavior observed 
(Kline, 1953a). 

Nonsimulable Measures. Three 
kinds of information have been uti- 
lized. In the field of mental testing, 
the Bender Gestalt Test has been 
used (Crasilneck & Michael, 1957); 
and the Rorschach test (Bergman, 
Graham, & Leavitt, 1947; Keir, 1945; 
Mercer & Gibson, 1950; Norgarb, 
1952; Orne, 1951), the argument, of 
course, being that it would be diffi- 
cult for adults to simulate the per- 
formance of young children on these 
tests, especially in the case of the un- 
Structured Rorschach test. Second, 
a number of physiological measures 
have been recorded while S was in 
the regressed state. These include 
the presence or absence of the 
plantar response (Gidro-Frank & 
Bowersbuch, 1948; True & Stephen- 
son, 1951) and the Babinski reflex 
(McCranie et al., 1955); changes in 
indices such as blood pressure, pulse, 
and respiration rates, and psycho- 
galvanic reflex (Kline, 1960; True & 
Stephenson, 1951); and changes in 
EEG characteristics (McCranie et 
al., 1955; True & Stephenson, 1951), 
the argument in the latter case being 
that there are characteristic differ- 
ences between the records of adults 
and children. 


= Third, special mention must be 
e of the advances in technique 
recently reported by Reiff and 
Scheerer (1959). In accordance with 
Scheerer’s general theoretical posi- 
tion, they abandoned the use of 
mental age tests, and utilized in- 
stead the notion of developmental 
levels, analyzing S's approach to the 
solution of various problems, rather 
than his correct or incorrect re- 
Sponses. They used a number of in- 
genious tests which they consider it 
‘would be particularly difficult to 
‘simulate. Thus, in the Lollipops 
_ test, the regressed adult was given a 
_ lollipop after making mud-pies, while 

his hands were still dirty. If true re- 
"gression had taken place, the child 
regressed to 4 years would, they ar- 
gued, naturally not worry about his 
dirty hands when accepting the 
lolly; the adult simulating regression 
would do so. On the cognitive side, 
_ they used a Pledge of Allegiance test 
(writing the pledge after reciting it), 

a Clock test (telling the time), a Left 
and Right test (identifying left and 
right e.g., in persons sitting Oppo- 
site), an Arithmetic test; Piaget's 
Hollow Tube Test (identifying the 
order in which colored beads will 
emerge from a hollow opaque tube 
after it has been rotated), and a 
Word Association test. On all of 
these tests, children show charac- 
teristic changes in modes of response 
_ with increasing age. On the Word 
Association test, for example, the 
most popular responses with children 
are quite different from those found 
in adults. They further argued that 
‘the adult who simulated a child’s 
_fesponses on this test (correctly or 
incorrectly) would show increased 
reaction time, since he would first 
‘need to inhibit his natural adult re- 
_ sponse tendency. 
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Testixna CONDITIONS 


Six distinct testing conditions have 
been employed. Four of these may 
be regarded as control conditions for 
the two extreme conditions of per- 
formance in the normal waking state 
compared with performance in the 
suggested regressed state under hyp- 
nosis. The six conditions are: 

1. Normal waking state—With 
some exceptions (eg, Platonow, 
1933; Sarbin, 1950b; Spiegel et al., 
1945), nearly all investigators record 
performance in this condition. 

2. Normal waking state, with de- 
liberate (not simulated) attempted 
recall of earlier events—This control 
condition has seldom been used, 
though it is clearly essential to the 
validity of results such as those ob- 
tained in True’s (1949) experiment. 

3. Normal waking state, with in- 
structions to simulate regression to a 
particular age-level—This control 
condition was used by Reiff and 
Scheerer (1959). 

4. Standard hypnotic state—This 
represents a control for the effects 
of hypnosis per se, and has rarely 
been used. 

5. Hypnotic state, with instruc- 
tions to simulate regression to a par- 
ticular age-level—This condition has 
been used only once experimentally 
(Crasilneck & Michael, 1957). 

6. Hypnotic state, with direct sug- 
gestion by E that S is now a certain 
age. 

No single study has used all of 
these conditions, and only one (Crasil- 
neck & Michael, 1957) has used as 
many as four. 

It would be an important require- 
ment of any experiment in this field 
that the judgments of behavior in the 
various conditions should be made in 
ignorance of the particular condition 
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in which S is placed at the time of 
assessment, i.e., the double-blind 
technique should be used. 


PRINCIPAL RESULTS 


Regression can be produced under 
hypnosis (Condition 6), the extent 
and accuracy of the regressed behav- 
ior being a matter of considerable 
dispute. Thus, Sarbin (1950b) using 
direct comparisons of Binet perform- 
ances, found that under hypnosis not 
one of his nine hypnotizable Ss 
achieved a mental age as low as that 
on the original test occasion. Using 
indirect comparisons, Crasilneck and 
Michael (1957) regressed their Ss to 
the age of 4, but found that the 
Bender Gestalt drawings were rated 
by independent judges as compar- 
able with those of 7-year-old chil- 
dren. It is not surprising, therefore, 
that inconclusive results have been 
generally reported for nonsimulable 
complex tests such as the Rorschach, 
except for obvious measures such as 
number of responses and form-qual- 
ity (Bergman et al., 1947; Orne, 
1951). On the other hand, it is im- 
portant to notice that such regres- 
sion as is achieved is often remark- 
ably successful. Kline (1950) showed 
that in spite of a very significant de- 
cline in score on the Otis Perform- 
ance tests under regression (from 
59.2 in the waking state to 24.5 under 
regression to 8 years), the IQ at the 
regressed ages showed less variability 
than is normally found when the test 
is repeated on separate occasions. 
Reiff and Scheerer (1959) reported 
almost uniformly perfect regression 
under hypnosis on all of the measures 
they used. Thus, on the Clock test, 
adults regressed to age 7 made 
errors characteristic of children of 
that age level, whereas the simulat- 
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ing controls did not make such er- 
rors. 

Regression, however, does seem to 
become more accurate as the fune- 
tions measured become more specific. 
True (1949), using 50 Ss in the ex- 
periment described earlier, found 
that when regressed to ages 10, 7, 
and 4, 92%, 84%, and 62% of Ss, 
respectively, correctly identified the 
day of their birthday; while 94%, 
86%, and 87% of Ss, respectively, 
correctly identified the day on which 
Christmas Day fell. Best and Mi- 
chaels (1954) found negative results 
in a similar experiment, but they 
used only five Ss and their procedure 
differed in important respects from 
that of True. Equally remarkable 
results were obtained by Reiff and 
Scheerer with the Word Association 
test. Thus, while adults uniformly 
respond to the word “man” with 
“women,” children equally uniformly 
respond with the word ‘‘work.’’ Their 
hypnotically regressed Ss responded 
with the characteristic child’s re- 
sponse while simulating Ss continued 
to use the adult response word. 
McCranie et al. (1955) reported the 
reinstatement of the Babinski reflex 
in 3 of their 10 Ss when regressed 
to the age of 1 month; while Gidro- 
Frank and Bowersbuch (1948) found 
significant changes in the plantar re- 
sponse, which were accompanied by 
changes in peripheral chronaxie. Mc- 
Cranie et al. (1955) did not, however, 
observe any significant change in 
EEG records in the regressed state. 

Moody (1946), Ford and Yeager 
(1948), and Erickson (1937), have 
all reported the reinstatement of dis- 
abilities (wheal marks, homonymous 
hemianopsia, attacks of unconscious- 
ness) under hypnotic regression; the 
disabilities no longer being present 
in the waking state, 
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The general behavior of the re- 
gressed S has been frequently re- 
ported as becoming more childlike, 
(e.g., Reiff & Scheerer, 1959) even to 
the extent of the appearance of the 
sucking response and loss of speech 
when regression is induced to a very 
early age (McCranie et al., 1955). 
In many respects too, the behavior is 
appropriate, not merely to the age to 
which S has been regressed, but in 
relation to the environment as it was 
at that time. 

Regression can be produced in the 
waking state by asking S to simulate 
the suggested age (Condition 3). 
Under these circumstances, Crasil- 
neck and Michael (1957) showed 
that Bender Gestalt drawings will re- 
flect the simulated age level less 
successfully than is the case with 
hypnotic regression; and this finding 
has been amply substantiated by 
Reiff and Scheerer (1959). 

Regression can be produced in the 
hypnotic state by asking S to simu- 
late the suggested age (Condition 5). 
Under this condition, Crasilneck and 
Michael (1957) showed that Bender 
Gestalt drawings will not be signifi- 
cantly different in quality from those 
produced by direct suggestion under 
hypnosis. 

The results obtained by Crasilneck 
and Michael (1957) for the waking 
state, waking state with simulated 
regression, hypnotic state with simu- 
lated regression, and hypnotic state 
with induced regression, for the 
Bender Gestalt test indicated that 
regression was not complete under 
any of the conditions, that regression 
could be simulated, and that hypnosis 
facilitated the production of regressed 
behavior. Reiff and Scheerer (1959) 
did, however, find complete regres- 
sion to appropriate developmental 
levels. > 
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There is some indication that emo- 
tional regression can be induced. 
Kline (1953a) regressed a female S to 
the age of 3 years and placed her in 
situations frequently found to pro- 
duce intense fear responses in young 
children (being left alone, entering a 
dark passage, seeing a strange person 
oddly dressed, sudden appearance of 
a live snake, sight of a headless doll, 
and presence of a live mouse). All 
but the first and last situations pro- 
duced realistic fear reactions in the 
regressed, but not in the waking, 
state, including involuntary urina- 
tion. Reiff and Scheerer (1959), anal- 
ogously in a play situation, found 
less repugnance to eating with filthy 
hands under regression. 

The mere induction of hypnosis it- 
self does not produce regressed be- 
havior in the normal S (Bergman et 
al., 1947; Kline, 1953a), though there 
are clinical reports of “spontaneous 
regression” under hypnosis (Gill, 
1948; Keir, 1945; Schneck, 1955). 

Sarbin (1950b) has reported a cor- 
relation of +0.91 between a regres- 
sion index and degree of hypnotiza- 
bility. His hypnotizable Ss were re- 
gressed under hypnosis, and later 
asked to simulate regression in the 
waking state. A regression index 
(RI) was computed for S according 
to the formula: 


MA (simulated regression) 
UE) ine Sea oak ah 
RI [ MA (original test) 


MA Cire re ait 
MA (original test) 


We may conclude from this brief 
survey of results that a prima facie 
case appears to have been made out 
for the assertion that under some 
conditions certain adults behave in 
ways which are characteristically 
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those of children, although many of 
the details remain to be filled in. 


THEORIES OF Hypnotic 
AGE REGRESSION 


Three theories have been proposed 
in attempts to account for the above 
results. 


Neurological Theories 


Platonow (1933) explained regres- 
sion in terms of what he called Pay- 
lov’s “true physiology of the brain,” 
using especially the notion of words 
as conditioned stimuli producing 
physiological, biological, and psycho- 
logical changes: 
the suggestion of previous ages brings forth a 
real organic reproduction of the engrams, the 
formation of which belongs to the earlier 
periods of the individual's life (p. 205). 


Regression is facilitated under hyp- 
nosis because the latter involves gen- 
eral inhibition of the cortex except 
for the area receiving auditory im- 
pulses. Under these conditions, the 
auditory stimulus (suggestion of re- 
gression) most readily activates the 
appropriate engrams. This theory 
would appear to be derived from the 
much older distinction between corti- 
cal and subcortical brain-processes, 
the latter mediating primitive re- 
sponses, It is interesting to note that 
McCranie et al. (1955) assert that 
lesions of Brodman’s Area 4 result in 
the restoration of the Babinski reflex 
in chimpanzees and man, while 
simultaneous bilateral ablation of 
Areas 4 and 6 produces infantile 
motor behavior in lower primates. 
Kline (1953b, 1954) has proposed a 
“neuropsychological’’ theory, derived 
from the experimental observation 
that what he terms habit progres- 
sion, as well as habit regression, can 
be demonstrated (Kline, 1951; Ru- 
benstein & Newman, 1954). Thus, 
in one study by Kline (1951), a 22- 
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year-old woman was able, under 
hypnosis, to produce the typical 
Wechsler-Bellevue record of a 65- 
year-old woman, even to the extent 
of obtaining a characteristic Deteri- 
oration Index score for that age- 
group. Kline’s theory (1953b) postu- 
lates that 

the actual state involved in such activity is 
not regression, not progression, but a central 
state of perceptual release or disorientation 
which permits activity in any dimension or 
direction of time-space orientation (p. 26). 


Under hypnosis, there occur what 
Kline (1953b) calls ‘directional al- 
terations from a central process” 
(p. 25) and he lays particular stress 
on the importance of transference 
relationships between S and the hyp- 
notist. He does not, however, regard 
the phenomenon as simply involving 
role-taking (see below) and his the- 
ory, in particular does not regard the 
evidence obtained from psychometric 
measures as crucial, though he does 
not deny their relevance. As at pres- 
ent formulated, Kline’s theory would 
seem to be too general to be amenable 
to disproof. 


Habit-Reactivation 


Hypnotic age regression may be re- 
garded as a special instance of instru- 
mental act regression. In the latter, 
if S possesses alternative response 
patterns to a given stimulus, the 
stronger will normally occur. If, 
however, the stronger is prevented 
from occurring, then the inhibited re- 
sponse pattern will be reactivated. It 
is possible that, in some way as yet 
unknown, hypnotic suggestion of re- 
gression may inhibit current response 
patterns, and hence permit the reac- 
tivation of “forgotten” response pat- 
terns. Although this theory as pre- 
sented here is very general, it is sur- 
prising that no consideration has yet 
been given by any worker in this 
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field to the relationship between hyp- 
notic age regression and instrumental 
act regression. Contrariwise, it would 
be predicted that newly acquired con- 
tioned responses would be lost in hyp- 
notic age regression. McCranie and 
Crasilneck (1955) attempted to test 
this hypothesis by setting up volun- 
tary hand withdrawal, and involun- 
tary eyeblink, conditioned responses. 
Under hypnotic regression, the former 
disappeared, the latter did not. Sim- 
ilar results were reported by LeCron 
(1952). 

Reiff and Scheerer (1959) have put 
forward a theory of hypnotic age re- 
gression which is derived from a gen- 
eral theory of remembering, but 
which could equally be regarded as a 
more general theory of habit reacti- 
vation than the instrumental act re- 
gression theory. According to them: 
the act of recall becomes also an act of con- 
temporary reconstruction of the past event, 


to a large extent dependent upon the state of 
the person at the time of the recall (p. 15). 


Such memories can take the form 
either of remembrances or of me- 
moria. In the waking state, remem- 
brances are memories with a consci- 
ous autobiographic index (i.e., ex- 
perienced as “‘being in my past”) and 
involve therefore personal continu- 
ity. Memoria are memories without a 
current autobiographic past reference 
(e.g., motor skills, vocabulary, etc.). 
Although the distinction is not abso- 
lute, remembrances are usually re- 
lated to an experiential context, 
whereas memoria are related to an 
environmental context. Both kinds 
of remembering may arise volun- 
tarily or involuntarily. 
Hypnotic age regression 

makes possible a reinstatement of the for- 
gotten personal past either in the form of 
remembrances, or in the form of memoria and 


earlier ego apparatuses (p. 52).... [In gen- 
eral] since . . . memoria are without an experi- 
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ential auto-biographic index, the ego can 
more easily activate appropriate memoria 
than remembrances (p. 49).... [but] the 
age-regressed subject may remember events 
with the experience of an auto-biographic 
index. However, here the reference point is 
no longer the actual present but that point 
in the autobiographic past to which the sub- 
ject has been regressed (p. 52). 


Remembrance or memoria reacti- 
vated in this way are always involun- 
tary. 

In hypnotic age regression, there- 
fore, the attempt is made to reacti- 
vate the general environmental and 
experiential contexts of the S at the 
age regressed to. If these remem- 
brances can be activated sufficiently 
strongly, then individual items of be- 
havior may be reactivated in the 
form of memoria. It should be noted 
that in regression, memoria and re- 
membrances are experienced as oc- 
curring here and now, whereas in the 
normal state they can both be re- 
ferred to the past, though only re- 
membrances will have autobiographi- 
cal references. 


Role-Playing 

This viewpoint has been well ex- 
pressed by the assertion that 
we may formulate the concept of age regres- 
sion by saying that the prevailing psychologi- 
cal condition enables the individual to take 
the role appropriate to the imagined world 
(Orne, 1951, p. 220). 


The most important exponent of 
this theory is Sarbin (1950a; Sar- 
bin & Farberow, 1952). Sarbin’s 
theory holds that all human interac- 
tion involves both S and E indulging 
in role-playing. The validity of role- 
playing depends upon a number of 
factors, including the validity of on 
perception of the interaction situa- 
tion, the aptitude of S for playing a 
particular role, and the current or- 
ganization of the self. Thus, if age 
regression is to appear 
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the S's perception of the child's role must 
have some veridical properties; there must be 
present evidence of the role-taking aptitude, 
and the assigned role must not be incongruent 
with the S's current self-perceptions, (Sarbin 
& Farberow, 1952, p. 119.) 


Sarbin’s theory has resulted in a 
number of interesting predictions, 
for example, that an S whose self- 
organization is relatively undevel- 
oped should show greater regression 
than an S whose self-organization is 
normal. Unfortunately, little con- 
crete evidence in support of Sarbin’s 
theory, in so far as it relates to age 
regression, has yet been produced. 


METHODOLOGICAL PROBLEMS 


Many methodological problems 
present themselves in connection 
with hypnotic age regression. They 
may be grouped into five areas, 


Types of Control 


Mention has already been made 
of some of the conditions under 
which testing has been carried out. 
Logically, at least three sets of fac- 
tors can be varied, giving eight possi- 
ble combinations of testing condi- 
tions. Thus, S may be tested in the 
hypnotic or waking state, the com- 
parisons made may be direct or in- 
direct, and regression or simulation 
may be attempted. Additionally, ac- 
count must be taken of the effects on 
performance of hypnosis, and the 
waking state, per se. 


Criteria for Regression 


It is obvious that complete bio- 
logical regression is impossible (the S 
in the regressed state does not, for 
example, diminish in stature). The 
critical question therefore becomes: 
“Does the hypnotically regressed 
adult perform as he imagines a child 
to function or is his regressive behav- 
ior a revival of memoria, i.e. of cer- 
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tain aspects of his previous func- 
tioning?” (Reiff & Scheerer, 1959, 
p. 83). It should be realized at the 
outset that the fact that hypnotic age 
regression may be incomplete does 
not in itself prove the validity of the 
role-playing theory; nor does the 
demonstration of successful role- 
playing in itself disprove the validity 
of regressive phenomena. The most 
satisfactory evidence for the validity 
of regression would involve the dem- 
onstration that S performed in the 
regressed state in a manner similar 
to his behavior at that age when a 
child (thus involving direct compari- 
son) together with the demonstra- 
tion that he was unable, as an adult, 
either under hypnosis or in the wak- 
ing state, to simulate the appropriate 
behavior. Evidence of this kind has 
thus far been presented only in iso- 
lated cases. It is not here intended to 
deny, of course, that in most in- 
stances of hypnotic age regression, 
both true regression and role-playing 
may be simultaneously involved. 


Hypnotic Technique 


Several important points have 
commonly been neglected. Criteria 
for measuring the depth of trance are 
usually not reported in sufficient de- 
tail. The speed with which regres- 
sion is induced is possibly a critical 
variable and may explain the failure 
of Best and Michaels (1954) to repeat 
the results of True (1949). It is prob- 
ably important to¥reinstate the 
earlier period gradually, rather than 
suddenly. The role of the hypnotist 
is often neglected—it has been argued 
that true regression becomes more 
likely if the hypnotist transforms 
himself into some person familiar to 
S at the regressed age, or at least 
into a neutral figure. Reiff and 
Scheerer (1959) lay particular stress 
on the importance of the instructions 
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given to S, who should be regressed 
to a specific date (e.g., a birthday) 
and not merely to a particular year. 
Fluctuations in performance should 
be controlled as far as possible by in- 
structing S not to deviate from his re- 
gressed age-level. 


Selection of Ss 


Very little attention has been paid 
to this important aspect of the prob- 
lem. Reiff and Scheerer (1959) lay 
great stress on the difficulty of ob- 
taining suitable Ss who must be rela- 
tively free from anxiety (severe anxi- 
ety about events happening at age 
4, for example, might well lead to 
resistance to regression to that age), 
must be suitably motivated, and, of 
course, satisfactorily hypnotizable. 
Such Ss are relatively rare. Selection 
of control Ss has been even more neg- 
lected. Reiff and Scheerer, for ex- 
ample, give few details about their 
control Ss and do not seem to realize 
that the experimental and control Ss 
should have been carefully matched 
on all relevant variables (including 
hypnotizability). This failure was 
especially serious in that, with the 
exception of two measures, Reiff and 
Scheerer did not control for perform- 
ance in the waking state, apparently 
assuming that all their Ss would per- 
form normally. 


Selection of Tests and Measures 


The search for nonsimulable tests 
has been markedly improved by the 
suggestions of Reiff and Scheerer 
who, in addition to the tests they 
used, have made a number of ingeni- 
ous suggestions for further research. 
The most significant measures thus 
far utilized are undoubtedly the 
Birthdays test of True (1949) and the 


‘Their five hypnotized Ss were chosen 
from an original group of over 100 Ss. 
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Word Association test used by Reiff 
and Scheerer (1954). The latter, how- 
ever, prefer the use of developmental 
schedules to mental age scales, and 
are interested more in the process of 
solution than in the solution itself. 
While the distinction is not entirely 
academic, it certainly has not the 
importance attributed to it by Reiff 
and Scheerer. Two minor points may 
be noted: the tasks, measures, or de- 
velopmental schedules used, should 
be appropriate to the age level re- 
gressed to; and care should be taken 
to prevent S from giving no response 
(“I don’t know”) except where such a 
response is explicitly predicted. 

It may safely be said that no fully 
adequate experiment has been car- 
ried out in this field. Thus, the most 
recent study by Reiff and Scheerer 
(1959), although admirable in many 
respects, contained a number of seri- 
ous faults: lack of control for per- 
formance in the waking state; failure 
to match experimental and control 
groups; and repeated testing at dif- 
ferent age levels of the same Ss in the 
experimental, but not in the control 
group. Even more serious, it is clear 
from the description given of the ex- 
perimental procedure, that the au- 
thors were aware, in the testing situa- 
tion, of which Ss had been hypno- 
tized, and which had not. 


DISCUSSION 


The potential importance of the 
phenomenon of hypnotic age regres- 
sion can scarcely be overestimated. 
Apart altogether from its possible 
value in general psychotherapy 
(Kline, 1950), and its usefulness in 
particular for the treatment of war 
neuroses by regressing the patient 
to the traumatic situation and mak- 
ing him relive the experience, it does 
not seem to have been generally re- 
alized that the technique itself could 
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provide a crucial test of the theory 
that learned responses are never 
“destroyed,’’ but only supplanted 
and remain available for activation 
under appropriate circumstances. In 
light of this, it is surprising how little 
experimental work has been carried 
out in this area. Furthermore, a 
good deal of this work can hardly be 
said to attain even minimally accept- 
able levels of methodological ade- 
quacy. 

Any acceptable theory of hyp- 
notic age regression must take ac- 
count of the apparent facts that re- 
gression can be simulated in the wak- 
ing state; that the amount of regres- 
sion is similar whether it is simulated 
under hypnosis, or suggested under 
hypnosis; and that regression be* 
comes more “accurate” as the re- 
sponse regressed to becomes more 
specific. It seems likely that neither 
the role-taking, nor the habit reacti- 
vation theories, taken separately, 
will account satisfactorily for the ob- 
served facts. Thus, the role-taking 
theory is clearly embarrassed by find- 
ings such as those of True (1949) in 
relation to the recall of factual in- 
formation under hypnosis, those of 
McCranie et al. (1955) in relation to 
the reactivation of physiological re- 
sponses, and those of Reiff and 
Scheerer (1959) in relation to the 
Word Association test. It is probably 
necessary to recognize that both the- 
ories must be invoked, each account- 
ing for some, but not all, of the facts. 
In this connection, it may be noted 
that a satisfactory explanation of the 
facts awaits the formulation of a 
valid general theory of behavior 
under hypnosis. Since, however, 
workers in this field are still strug- 
gling to elucidate basic concepts 
(Barber, 1958; Sutcliffe, 1960), it is 
probable that a more careful and 
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thorough examination of the phe- 
nomena encountered in hypnotic age 
regression will provide data highly 
relevant to this aim. 

We may conclude, therefore, with 
some general suggestions concerning 
future research in this field. First, a 
crucial area of research is the problem 
of partial versus complete age regres- 
sion. It seems clear that complete re- 
gression would be extremely unlikely 
in relation to complex items of behav- 
ior, since early habit-structures in- 
volving complex skills would surely 
be affected by subsequent growth of 
the skill, if only through the process 
of retroactive inhibition. On the 
other hand, relatively isolated items 
of knowledge (such as knowing on 
what day one’s fourth birthday fell) 
might easily survive relatively un- 
changed by subsequent learning, to 
be reactivated under appropriate 
conditions. Mention has alfeady 
been made of the necessity for a close 
examination of the conditions under 
which regression is induced. 

Second, more attention should be 
paid to an analysis of simple aspects 
of behavior, rather than complex 
ones. For example, instead of using 
measures such as the Binet, attention 
could be concentrated on, e.g., de- 
velopmental schedules, which objec- 
tively record the presence or absence 
of specific items of behavior at dif- 
ferent age levels. The suggestions for 
research made by Reiff and Scheerer 
(1959) are particularly valuable in 
this connection. The use of condi- 
tioning techniques, as exemplified by 
the study of McCranie and Crasil- 
neck (1955) should also yield crucial 
information. 

Third, much more attention should 
be paid to a careful description of the 
total behavior of S in the regresse 
state. Much has been made of the 
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fact that S behaves in a manner ap- 
propriate to his regressed age. Al- 
most invariably, however, the de- 
scription is highly selective. For ex- 
ample, as Orne (1951) has pointed 
out, regression implies also that all 
knowledge acquired subsequently to 
the age to which S has been regressed 
should be unavailable. In other 
words, S should no longer be cog- 
nizant of current affairs, political, so- 
cial, or otherwise. It is extremely 
curious that no information of a con- 
crete nature on this vital point is 
available, except for a few vague, gen- 
eral assertions. 

Fourth, no attention has been paid 
to the study of the behavior of S in 
the regressed state over a substan- 
tial period of time. Practically all 
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investigators have restricted their ob- 
servations to laboratory situations. 

Fifth, the fact that hypnosis ap- 
parently facilitates simulated regres- 
sion, but the addition of direct sug- 
gestion does not produce an improve- 
ment over hypnotically simulated re- 
gression requires further exploration. 
Thus far, evidence relating to this 
important point is restricted to re- 
sults from a single study (Crasilneck 
& Michael, 1957). 

The importance of the phenomena 
encountered in hypnotic age regres- 
sion, and the advances in technique 
which characterize the investiga- 
tions of Reiff and Scheerer (1959) 
should surely lead to a revival of 
interest in this problem. 
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Early evidence (Crew & Mirskaia, 
1931) suggested that the population 
size of many mammalian species and 
especially of rodents is self-limiting. 
In 1952, Calhoun demonstrated den- 
sity limitation in a confined popula- 
tion of Norway rats. The population 
he observed never exceeded 200, even 
though he estimated the growth po- 
tential in terms of shelter, space, and 
food to be well over 5,000. Subse- 
quently, a number of studies have 
demonstrated that the reproductive 
capabilities of rodents living in high- 
density populations are impaired 
(e.g., Chitty, 1955; Christian, 1959c; 
Christian & LeMunyan, 1958; Hoff- 
man, 1958; Kalela, 1957; Louch, 
1956; Southwick, 1955a, 1955b; 
Strecker & Emlen, 1953). Research 
into the mechanisms by which den- 
sity limitation is accomplished re- 
veals an interaction between density, 
endocrine function, and behavior 
that has major implications for the 
behavior theorist working with ani- 
mal subjects. This research is re- 
viewed in the present paper. 

In a rather comprehensive theory 
based on Selye’s conception of a 
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general adaptation syndrome, Chris- 
tian (1950) implicated the endocrine 
system in limitation of population 
density. He proposed that the ob- 
served triphasic population cycle con- 
sisting of an initial growth of popu- 
lation followed by a period of stabil- 
ity and then a period of decline could 
be accounted for by a stimulus feed- 
back reaction described by Selye 
(1946), involving the endocrine sys- 
tem and particularly a pituitary- 
adrenocortical-gonadal axis. Accord- 
ing to Selye, certain pituitary-adrenal- 
gonadal effects are produced by all 
general stressors and are in propor- 
tion to the severity of the stress. 
These effects consist in part of hyper- 
activity of the pituitary and adrenals 
and hypoactivity of the gonads. 
Christian reasoned that if population 
density were a stressor, it would be 
inversely related to gonadal activity 
and therefore to reproductive behav- 
ior, as well as to other factors affect- 
ing survival. Such relationships 
could account for the apparently self- 
limiting nature of density of popula- 
tion and would help to account for 
the triphasic population cycle. Under 
conditions of low density of popula- 
tion and in otherwise favorable cir- 
cumstances, gonadal and reproduc- 
tive activity would be high, resulting 
in an expanding population. The in- 
creasing population density, acting 
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as an increasing stressor, would 
eventually reduce reproduction to 
the point that deaths would match 
births. The population would reach 
equilibrium at that point and would 
enter the second, stable phase of the 
population cycle. Such stability 
would be maintained until the popu- 
lation was subjected to an additional 
stressor, such as increased daylight or 
increased cold occurring with sea- 
sonal change. The additional stressor 
could destroy the equilibrium and 
precipitate a more or less rapid de- 
cline of population, partially by its 
effects on reproduction rate and par- 
tially by other lethal effects of the in- 
creased stress. 


NATURAL POPULATIONS 


Support for Christian’s theory was 
provided by the discovery of a rela- 
tionship between stages of the den- 
sity cycle and adrenal weight in nat- 
ural populations of Norway rats 
(Christian & Davis, 1956). Rats 
from 21 Baltimore city blocks were 
systematically sampled and_ their 
population numbers estimated over a 
period of months. Since the rats sel- 
dom cross streets, each block was es- 
sentially an independently varying 
Population. At time of sampling, 
each of these populations was €lassi- 
fied as belonging to one of five suc- 
cessive stages of a population cycle: 
low stationary, low increasing, high 
increasing, high stationary, and de- 
creasing. Beginning with the low in- 
creasing stage, a progressive increase 
in adrenal size was found for the suc- 
cessive stages. The relationships in 
the low stationary stage were some- 
what ambiguous, but perhaps appro- 
priately so if it is remembered that 
this stage constitutes the end of the 
population cycle as well as its begin- 
ning. To the extent that adrenal size 
correlates with adrenal activity, the 
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results strongly suggest a progressi 
increase in stress, and in endocrine 
response to it, as the population cycle 
progresses. However, in this study, 
significant weight changes were not 
found in the thymus and pituitary 
glands, which normally show re 
sponse to prolonged stress. Since nu- 
tritive elements were found in abun- 
dant amounts, social rather than 
strictly biological factors were pre- 
sumed to be primary in determining 
the differences in adrenal weights. In 
a study of a rural population of Nor- 
way rats, Christian and Davis (Chris- 
tian, 1959c) found a correlation of .90 
between population density change 
and adrenal weight change. In this 
study, pituitary weight also changed 
with population size and correlated 
-99 with adrenal weight. 

Louch (1956) reported similar find- 
ings in two natural populations of 
meadow voles. Population densities 
were estimated from monthly live 
trappings. In both populations, 
adrenal weight correlated positively 
with density. Eosinophil count, a 
blood-fraction measure known to — 
vary inversely with adrenocortical 
activity, was found to have an in- 
verse relationship to population den- 
sity. The absence of apparent food 
shortages again suggested that non- 
nutritive factors were primarily re- 
sponsible for the observed correla- 
tions. 


POPULATION SIZE 


Christian -has suggested that a 
logarithmic relationship exists be- 
tween size of population and the en- — 
docrine or related effects. A number 
of his laboratory and field researches 
support this contention. In one 
study (Christian, 1955a), he placed 
weanling male mice in groups of 1, 4, 
6, 8, 16, and 32 for one week. Adrenal 
weight in all cases except for the 
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population of 32 showed a linear re- 
lationship to the logarithm of the 
population size. Adrenal weight for 
the largest population showed a de- 
cline from the next largest. This de- 
cline was initially interpreted as due 
to “social structure deterioration,” 
representing some decrease in stress 
at the greatest density. More re- 
cently, however, Christian (1959c) 
has found that the relative decrease 
in adrenal weight at this high density 
is due to a loss in lipid content of the 
cortical cells, indicating an intense 
activation of the adrenocortex. The 
trend of increased adrenocortical ac- 
tivation with increased density would 
therefore appear to hold for all limits 
tested. Wild and tame mice show 
similar types of response to artifi- 
cially established densities (Chris- 
tian, 1955b), although both initial 
adrenal size and response to density 
is greater in the wild mice. Both re- 
sponded to increased density by in- 
creased adrenal size, sex gland 
atrophy, and thymus atrophy (an- 
other index of stress). 

Christian (1956a) found similar 
relationships in a study of free-grow- 
ing populations of laboratory reared 
mice. Beginning with a few pairs in 
large cages amply supplied with food 
and water, some populations were 
allowed to reach an apparently self- 
determined asymptote. This was 
much below the number of animals 
that could be supported by the food 
and cover available. Other popula- 
tions were allowed to reach approxi- 
mately half of the expected maximal 
density of population. Still other 
animals were derived from segregated 
pairs and were maintained as segre- 
gated pairs following weaning. In 
the free-growing populations, the 
growth curves were sigmoid. Birth 
rate and survival of infants declined 
in proportion to the logarithm of the 
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size of the population, As was found 
in both the wild and the artificially 
constituted populations, increased 
adrenal weight was found to be asso- 
ciated with increased density of pop- 
ulation. Histological examination re- 
vealed that the increase in weight was 
due primarily to hypertrophy and 
hyperplasia of the zona fasciculata, 
suggesting greater adrenocortical 
functioning. In the young male mice, 
part of the higher adrenal weight was 
due to delayed involution of the X 
zone, a transitory layer of the adreno- 
cortex, found in young mice. 

this involution is brought about by 
androgens, the observed delay in in- 
volution suggests that androgen pro- 
duction, and therefore the onset of 
puberty, occurs at a later age in male 
mice from high-density populations. 
It further supports the assumption 
that a pituitary-adrenal-gonadal in- 
teraction system is involved in con- 
trol of population density. Repro- 
ductive organs of the mature high- 
population-density males were also 
lighter and spermatogenesis was par- 
tially suppressed as compared with 
the low-density controls. 

Other results implicating endocrine 
involvement in population control 
were also found. The decrease in sur- 
vival rate of infants in the dense pop- 
ulations was attributed to deficient 
lactation of the mothers. The infants 
who died usually did so in 10 to 14 
days after birth and were found to be 
uninjured but with empty stomachs. 
The survivors were weaned early, ap- 
peared grossly stunted, and were in 
poor condition. If production of pro- 
lactin, one of the gonadotrophic hor- 
mones of the pituitary, is suppressed, 
along with suppression of the other 
gonadotrophic hormones, then re- 
duced lactation and the observed in- 
fant mortality and stunting would be 
expected. Suppression of the gonad- 
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otrophins would also account for the 
observed reduction in numbers of 
pregnancies and numbers of embryos 
per pregnancy in the denser popula- 
tions and for the increase in numbers 
of resorbing embryos per pregnancy 
observed at autopsy. These sugges- 
tions are in general agreement with 
the recent findings of Helmreich 
(1960). In this case grouped female 
deer mice showed increased resorp- 
tion of implanted embryos, although 
the incidence of pregnancy and the 
number of embryos implanted were 
not different from those of the iso- 
lated controls. 

Of considerable interest is Chris- 
tian’s additional finding, consistent 
with Chitty’s (1955) speculations, 
that the effects of decreased body 
weight, intrauterine mortality, de- 
creased litter size, and reduced abil- 
ity to lactate were still observable in 
the first and second generation off- 
spring of grouped animals. The ef- 
fects presumably were transmitted 
as a function of decreased, or nutri- 
tionally altered, milk supply of the 
mother. 

Louch (1956) carried out a study of 
three freely growing but confined 
populations of meadow voles that in 
many respects parallels Christian's 
studies of house mice. During the 
period of observation, the three 
meadow vole populations had access 
to abundant food and nesting sup- 
plies. The growth curves were sig- 
moid, much like those found by 
Christian, although the rate of growth 
varied considerably among popula- 
tions. Although number and size of 
litters were not significantly corre- 
lated with density, several other fac- 
tors that tended to limit population 
size did vary as expected. Litter mor- 
tality was high under conditions of 

dense population. This was attrib- 
uted to the mother’s reduced ability 
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to lactate, to her greater tendency to 
eat or abandon her pups, and to in- 
creased trampling and disturbance of 
the litters by other animals. Adult 
mortality also correlated positively 
with size of population and was most 
pronounced during periods of popu- 
lation decline. The correlation was 
due at least in part to an apparent in- 
creased susceptibility to disease in 
dense populations. Amount of fight- 
ing and wounding increased with 
density. There was a tendency for 
fecundity, as measured by number of 
mice with scrotal testes or perforate 
vaginae, to correlate inversely with 
density. This inverse correlation was 
significant in one population, ap- 
proached significance in another, and 
was opposite in sign and insignificant 
in the third. At high densities, males 
competed aggressively for females in 
heat, by chasing, fighting, and push- 
ing each other away from the female. 
As a result, the number of mountings 
increased but few mountings led to 
completed copulation. Lowered eo- 
sinophil counts at the higher densi- 
ties, together with the other findings, 
suggested, as do Christian’s results, 
the direct involvement of the pitui- 
tary-adrenal-gonadal axis in the dy- 
namics of population density. 
Although many effects appear to 
follow a logarithmic relationship to 
population size, factors other than 
density alone can be important. 
Southwick (1955a) reports, for ex- 
ample, that different freely-growing 
populations of wild trapped house 
mice confined under essentially simi- 
lar conditions varied as much as five 
fold in maximum size of population 
attained. He attributed the differ- 
ences to uncontrolled social and ge- 
netic factors. Christian’s finding 
(1955b) that wild mice show a more 
marked adrenal response to density 
than do laboratory mice suggests the 
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importance of genetic factors. Other 
investigators, while not questioning 
the fact of density limitation, have 
questioned whether or not size of 
population is the crucial variable. 
Several studies have been directed 
toward assessing possible alternative 
explanations. 


WOUNDING AND SOCIAL RANK 


In mice living 4, 8, or 16 to a cage, 
Southwick and Bland (1959) found 
no significant differences among the 
groups in adrenal weight unless 
wounded animals were compared 
with nonwounded. The wounded ani- 
mals were found to have significantly 
heavier adrenals. They conclude that 
wounding is the essential operant 
in adrenocortical change and that 
higher density acts indirectly to in- 
crease adrenal size by creating a situ- 
ation in which fighting and wounding 
is more likely to occur. Chitty, 
Chitty, Leslie, and Scott (1956) 
found similar evidence. Young male 
voles were put in contact with old 
mated pairs for periods of about 
2 hours a day for several days. 
Fighting, chasing, and wounding typ- 
ically occurred. The more severely 
wounded animals had a higher liver, 
spleen, and adrenal weight and a 
smaller thymus and body weight 
than did the less severely wounded. 
Clarke (1953), too, found similar ef- 
fects when voles were introduced into 
cages containing a pair of “resi- 
dent” animals. The newly intro- 
duced voles were viciously attacked 
and wounded; the longer the period 
of exposure, the more severe were 
their wounds and glandular changes. 
These studies implicate the physical 
effects of fighting and wounding as 
crucially important. 

Contradictory results have been 
obtained in other studies, in which 
no relationship was found between 


445 


amount of wounding and adrenal 
size and in which glandular effects 
occurred with little or no fighting and 
no wounding at all. Christian (1959b) 
measured adrenal hypertrophy and 
presence or absence of scarring in 50 
populations of four, five, or six albino 
male mice each. When adrenal 
weight was corrected for body weight, 
adrenal hypertrophy, found after 1 
week of grouping, did not reflect 
either the severity of fighting or the 
amount of injury received. Barnett 
(1955), working with two strains of 
rats, took movies of fighting be- 
havior, territoriality establishment, 
and the working out of hierarchies 
within groups. Histological examina- 
tion of the adrenals revealed hyper- 
trophy in the subordinate animals 
only; this hypertrophy was related 
to the social position within the 
group but not to the amount of fight- 
ing. Christian and Davis (Christian, 
1959c) also found that dominant 
Norway rats showed little adrenal 
hypertrophy, even though they 
fought as much as or more than sub- 
ordinate animals that did show adre- 
nal change. Southwick and Bland 
(1959) found adrenal hypertrophy 
more likely to occur in males housed 
with females than in males housed 
with other males, even though fight- 
ing was not observed and wounding 
did not occur in either case. When 
wild male house mice were grouped 
six to a cage 4 hours a day for several 
days (Davis & Christian, 1957), a sig- 
nificant negative relationship was 
found between social rank and adre- 
nal weight. A similar relationship 
was found by Vandenbergh (1960) 
using eosinophil levels and adrenal 
weights as indices of adrenocortical 
activity. 

These studies suggest that social 
rank is an important factor in deter- 
mining endocrine response. Amount 
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of wounding and social rank tend to 
be related in recently grouped popu- 
lations, so that some correlation with 
wounding might be expected under 
some circumstances. Also, the height 
of the pyramid towering over a low- 
status animal, as well as the number 
of low-status animals, is a factor of 
the size of the population and would 
lead to an expected correlation be- 
tween average endocrine response 
and population size. If social status 
were crucially important in determin- 
ing the endocrine response, then the 


reproductive capacity and stress vul- | 


nerability of the low-status animals 
would be affected first in an expand- 
ing population. A selective advan- 
tage would therefore accrue to those 
characteristics making for high status 
in the population. 

An interesting parallel appears be- 
tween these data and data showing 
that hormonal variations in the blood 
stream are related to changes in dom- 
inance. If the initial status differ- 
ences among animals are not too 
wide or too greatly solidified by learn- 
ing, the administration of androgen 
to low-ranking normal and castrated 
animals increases the dominance 
status in both the male and the fe- 
male (see Beach, 1948; Bindra, 1959). 


LIVING SPACE 


In studies of white Leghorn chick- 
ens, Siegel (1959a, 1959b, 1960) 
placed different numbers of birds in 
equal-sized pens and found that the 
more crowded groups had larger ad- 
renals and produced fewer eggs. In 
some comparisons, he found smaller 
pituitary weights and histochemical 
evidence of greater adrenocortical se- 
cretion in the more crowded groups. 
Siegel ascribes these relationships to 
differences in floor space per animal. 
They are highly consistent with the 
data from rodent populations in re- 
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lating density to endocrine and re- 
productive response. In Siegel's 
studies, as in many of the rodent 
studies, population size and living 
space per animal are confounded, 
leaving open the possibility that size 
of population rather than living space 
is the crucial variable affecting endo- 
crine response. Other studies tend to 
rule out the importance of living 
space as an independent variable. 
Christian found that the same posi- 
tive relationship between population 
size and adrenal weight held for mice 
even when floor space was increased 
42 times. In an unpublished study, 
we have found that the relationship 
held when living space per mouse 
was exactly equated, i.e., when a 
population of 10 animals was housed 
in twice as much space as a popula- 
tion of 5 animals and in 10 times the 
space of individually housed animals. 


NOVELTY 


A few studies have been concerned 
with the effect of stimulus change on 
endocrine response, on the assump- 
tion that a larger population offers 
the possibility of greater novelty and 
that novelty might be the important 
variable in the density studies. Chris- 
tian and Davis (1955) tested the pos- 
sibility that density reduction might 
be as stressful as density expansion. 
Rat populations in three Baltimore 
city blocks were reduced by trapping 
to about one-half of their estimated 
maximum and were maintained at 
that level for several months. An 
over-all reduction rather than in- 
crease in adrenal weight was found, 
suggesting that the population re- 
duction was not stressful, at least as 
measured by changes in adrenal 
weight. It should be noted that pos- 
sible transitory endocrine changes 
immediately following the density 
reductions were not measured. 
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A study by Siegel (1959c) also in- 
dicates that a density reduction is 
equivalent to a reduction in stress, 
as measured by adrenal weight re- 
gression. Twenty-five birds from 
each of two different groups of white 
Leghorn female chickens, housed 50 
and 150 birds per pen, were sacrificed 
over a 15-day period. As expected, 
adrenal hypertrophy was more ex- 
tensive in birds coming from the 
larger group. In both populations 
adrenal weights were significantly re- 
lated to the day of sacrifice, with re- 
gression equations indicating that ad- 
renal glands weighed progressively 
less as autopsies continued over the 
15 day-period and population density 
decreased. 

Vandenbergh (1960) found a tran- 
sitory drop in eosinophil count fol- 
lowing grouping of mice. This re- 
sponse, indicative of increased ad- 
renocortical secretion, reached a peak 
approximately 4 hours after group- 
ing and had largely disappeared by 
the second week. Change in adre- 
nal weight was less rapid and less 
transitory. Christian (1959a) found 
an initial increase in urinary corti- 
costeroid levels in guinea pigs follow- 
ing grouping, followed by a return to 
pregrouping levels within 3 days. 
Other investigators (Holcomb, 1957; 
Levine, 1959; Mason, 1959; Vogt, 
1951) have found that almost any 
shift in stimulation will alter eosino- 
phil and corticosteroid levels. It may 
therefore be that either a density in- 
crease or a density decrease would 
result in an initial rise in corticoid 
secretion, whereas only an increase in 
density would result in noticeable ad- 
renal hypertrophy and other gross 
morphological changes. The possibly 
transitory stimulating effect of den- 
sity reduction has not as yet been 
demonstrated, however. Some evi- 
dence that novelty is not the crucial 
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factor in the more persistent morpho- 
logical changes associated with high 
density is the differentially greater 
response of the low-status animals 
(see previous discussion). It seems 
probable that the dominant animals 
encounter as many or more novel 
situations as do the more socially re- 
stricted low-status animals, and yet 
their glandular response is less. 


EFFECT OF TRANQUILIZERS 


One study has been done on the ef- 
fect of tranquilizers on endocrine re- 
sponse to population density (Chris- 
tian, 1956b). Mice receiving reser- 
pine in their drinking water showed 
less extensive glandular alteration 
than did similarly grouped mice not 
receiving tranquilizer. The results 
are interpreted as supporting the hy- 
pothesis that the density-related 
changes in endocrine function are 
due to sociophysiological response to 
group pressures. 


FEMALE Estrus CYCLE 


Several studies have focused on the 
effect of population density on the 
female estrus cycle. van der Lee and 
Boot (1955, 1956) found that hous- 
ing female mice four to a cage often 
prolonged by several days the nor- 
mal 4 to 6 day occurrence of estrus. 
This temporary suspension of estrus 
of grouped females was confirmed by 
Dewar (1959), Lamond (1958, 1959), 
and Whitten (1956, 1957, 1958, 
1959). Whitten (1959) reports sus- 
pension of estrus for as long as 40 
days when females are grouped 30 
to a cage, with the estrus cycles 
promptly returning when the mice 
are separated into individual cages. 

These results suggest the possibil- 
ity that prolonged female diestrus oc- 
curs in dense populations and is one 
mechanism of density control. There 
is, however, evidence that tends to 
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contradict such a conclusion. Whit- 
ten (1956, 1957, 1959) and Lamond 
(1959) have demonstrated that the 
introduction of a male into the fe- 
male group or that the placing of a 
previously grouped female with a 
male will terminate the diestrus and 
will usually lead to pregnancy in a 
few days. Mating occurred predomi- 
nantly on the third night after pair- 
ing when previously grouped females 
were placed with a male, indicating 
that contact with the male termi- 
nated a diestrus period and initiated 
an estrus cycle (Whitten, 1956, 
1959). In contrast, matings with fe- 
males previously housed individually 
were more randomly distributed 
among the first four nights, indicat- 
ing the pre-existence of estrus cycles 
unrelated to the introduction of the 
male. 

The ability of a male to terminate 
the diestrus of grouped females and 
the absence to date of reports of ob- 
served density-related increase in 
diestrus of females in mixed popula- 
tions suggest that it is not a pre- 
dominant factor in population con- 
trol. The evidence nevertheless is 
consistent that the grouping of fe- 
males results in diestrus, and this ef- 
fect may be an important factor in 
determining the endocrinological or 
behavioral status of subjects used in 
laboratory settings. Whitten (1959) 
posits that some of the effects ob- 
served are mediated by the pituitary- 
gonadotrophic function, a theory 
that would relate these results closely 
to other observed effects of popula- 
tion density on endocrine function. 

Controversy exists concerning the 
nature of the diestrus of the grouped 
females. Some investigators (Dewar, 
1959; van der Lee & Boot, 1955, 
1956) have attributed the diestrus to 
pseudopregnancy. Others (Lamond, 
1959; Whitten, 1956, 1957, 1958) 
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consider the condition to differ in 
crucial respects from true pseudo- 
pregnancy, being easily terminable 
at any time by the introduction 
of a male, being associated with re- 
duced weight of ovaries and uterus 
and with reduced number or absence 
of corpora lutea, and being accom- 
panied by mucified vagina. With the 
possible exception of mucified vagina, 
none of these effects would be ex- 
pected with true pseudopregnancy 
(Nalbandov, 1958; Turner, 1955). 

Whitten (1957) argues that severe 
stress reactions are not present in the 
grouped females, since they appear 
healthy, retain their body weight, re- 
turn to estrus rapidly upon isolation 
or pairing with a male, and become 
pregnant without apparent diffi- 
culty. Christian (1960) cites recent 
evidence, however, suggesting some 
endocrine response to grouping of fe- 
males. As compared to isolated con- 
trols, he found mild hyperplasia 
of the adrenal fasciculata-reticularis 
zone in grouped females, suggesting 
increased ACTH production by the 
pituitary. He also cites other evi- 
dence suggestive of increased pitui- 
tary-adrenal response. The response 
was not, however, so great as that 
observed in groupings of males or of 
mixed sexes. 

Present evidence suggests that ol- 
factory cues are important in initiat- 
ing diestrus in grouped females, in 
terminating diestrus, in controlling 
sexual behavior leading to pregnancy, 
and even in preserving or disrupting 
pregnancy after it occurs. Lamond 
(1958) and Whitten (1959) found 
that females housed singly but sepa- 
rated from each other only by a par- 
tition showed disruption of the estrus 
cycle. van der Lee and Boot (1956) 
found that olfactory bulb removal 
reduced the number of females that 
became diestrus under conditions of 


POPULATION DENSITY AND ENDOCRINE FUNCTION 


grouping. Whitten (1956) found that 
mating of a grouped female could be 
shifted predominantly to the first 
night, instead of the third night, fol- 
lowing pairing with a male if a male 
were enclosed within a small basket 
in the female cage for the 2 days 
prior to pairing or if the females were 
placed in a cage recently contami- 
nated by males. Lamond (1959) re- 
ports that the number of litters born 
to anosmic mice is significantly 
smaller than for either normal or 
blinded animals. Bruce and Parrott 
(1960) report that pregnancy is 
blocked in a high proportion of re- 
cently mated intact female mice ex- 
posed to strange males, but not in 
anosmic females so exposed. Whether 
or not the olfactory cues that appear 
to mediate these effects operate 
through an effect on the pituitary- 
adrenal-gonadal system has not yet 
been established. 


EFFECTS ON NONREPRODUCTIVE 
BEHAVIOR 


If density of population affects en- 
docrine function, then it will almost 
inevitably affect behavior studies. 
The relationships between popula- 
tion density and behavior will not be 
reviewed, other than briefly to indi- 
cate that the effects may be crucial 
for many studies. With regard to 
learning ability, for example, Marx 
(1956) found that grouped rats could 
learn a vigorous lever-pressing re- 
sponse faster than individually 
housed animals. With regard to 
“emotionality,” Bovard and New- 
ton (1956) found that rats living in a 
group showed more defecation and 
vocalization when transported by the 
experimenter. Much work has ap- 
“peared and is appearing on the effects 
of early handling on later behavior. 
The evidence (e.g., Levine, 1959) 
that the early handling effects are 
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mediated by endocrine response to a 
“stressful” situation suggests the im- 
portant role that endocrine function, 
and therefore population density, 
may have on such diverse variables 
as learning ability, survival, and 
brain chemistry. Clearly, the field is 
ripe for more experimental work. 
Also, clearly, the experimenter who 
draws his subjects haphazardly from 
colony cages containing varying num- 
bers of animals is introducing into 
his study an uncontrolled variable 
that may crucially affect his ob- 
tained results. 


SUMMARY 


Population density has been shown 
to affect endocrine function, being 
positively related to adrenal hyper- 
trophy and adrenocortical activity 
and negatively related to gonadal 
and mammary activity. Other fac- 
tors being equal, many reactions ap- 
pear to vary as the logarithm of the 
population size. However, there is 
some evidence that marked genetic 
differences in response exist as well 
as some evidence that population size 
is only indirectly a causative agent. 
Amount of wounding does not appear 
to be crucially important, although 
social rank, which at times correlates 
with amount of wounding, is a good 
predictor of individual response to 
population pressures. Mechanical re- 
striction of living space appears to be 
unimportant within broad limits. 
Response to novelty may account for 
some transitory endocrine reactions 
but seems unlikely to be the crucial 
variable in less transitory morpho- 
logical effects of population size. 
Grouping effects on female estrus 
cycle appear related more to olfac- 
tory cues and sexual composition of 
the group than to population density 
per se. One effect of the endocrine 
response appears to be an alteration 


450 


of reproductive capacity such as to 
provide a self-limiting control of 
population size. Learning ability, 
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emotionality, and other behavior 
may also be altered by variations in 
density of population. 
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DOES THE HEART LEARN? 
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In our time of cohabitation of vari- 
ous sciences one may wonder about 
the kind of affairs psychology has 
with some of the more firmly estab- 
lished disciplines. Other sciences may 
very well believe that the progeny of 
a relationship with psychology would 
necessarily be illegitimate. Or per- 
haps, at best, that psychology would 
have everything to gain and nothing 
to give. Must psychology be the 
protegee, or does it have unique 
techniques to share? It is the aim of 
this paper, by way of presenting some 
experimental findings, to suggest that 
certain techniques of modern psy- 
chology can be useful in the analysis 
of problems of cardiovascular physi- 
ology. 

Detailed encouragement to the 
psychologist to use his techniques in 

the physiological laboratory comes 
from current adjustments in physio- 
logical thinking. Reviews of recent 
circulatory research by Rushmer 
(1955), Rushmer and Smith (1959), 
point out that the cardiovascular 
system is not altogether faithful to a 
few classical laws based largely upon 
simple hydraulic principles (Bain- 
bridge, 1915; Patterson, Piper, & 
Starling, 1914). Rather it appears 
that this system is true to many 
principles roving about on several 
levels of analysis, among them the 
principles derived from conditioning 
procedures. 

Gantt’s translation of The Internal 
Organs and the Cerebral Cortex by 
Bykov (1957) could be the signal for 
a methodological revolution in cer- 
tain phases of biological analysis and 

control, wherein the physiological re- 
actions of the intact organism are 
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modified by conditioning techniques. 
Of particular interest in this book for 
our present discussion is the chapter 
on circulatory adjustments. Several 
experiments are cited wherein car- 
diac, vasomotor, and even splenic re- 
sponses are conditional upon extero- 
ceptive stimuli presented by the ex- 
perimenter. 

In contrasting the results men- 
tioned in the Bykov book with those 
from other sources we are brought to 
what is perhaps the most paradox- 
ical feature of cardiovascular condi- 
tioning, the form of the conditioned 
cardiac response. Granting that the 
heart does learn, just what is it that is 
learned? 


WHAT DOES THE HEART LEARN? 


CR-UCR Similarity. Soviet inves- 
tigators generally suggest that the 
conditioned response (CR) closely re- 
sembles the unconditioned response 
(UCR) as illustrated by an experi- 
ment of Petrova (Bykov, 1957). An 
auditory stimulus (whistle) was com- 
bined with intravenous injections of 
nitroglycerin. Because the act of in- 
jecting the fluid would act as a con- 
ditioned stimulus, its effect was ex- 
tinguished with repeated intravenous 
injections of normal saline. The 
whistle, on the other hand, was al- 
ways sounded after the nitroglycerin 
had been injected (but before the ef- 
fect of the drug was manifest). After 
about 100 pairings of the whistle and 
nitroglycerin the whistle presented 
alone produced changes typical of 
those elicited by the drug (acceler- 
ated heart rate, decrease in QRS volt- 
age, and augmented P and T waves). 

Delov (Bykov, 1957) demon- 
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strated that a conditioned stimulus 
may produce a very different re- 
sponse from the above when com- 
bined a number of times with a drug 
of different consequences. In this 
experiment the conditioned stimulus 
(CS) was actually the stimulus com- 
plex associated with the injection, 
while the unconditioned stimulus 
(UCS) was a 0.2-gram injection of 
morphine. After 20 to 30 injections, 
the CS given without morphine pro- 
duced the same changes in the elec- 
trocardiogram as those produced by 
morphine (deceleration in heart rate 
and marked reduction in the P de- 
flection). 

Additional experiments (Bykov, 
1957) showing the similarity of UCR 
and CR for other drugs have been 
conducted by Samarin (strophan- 
thin) and Levitin (acetylcholin and 
epinephrine). 

Other investigators have taken a 
different view of the form of the heart 
rate CR. For example, in some ex- 
periments with human subjects by 
Zeaman, Deane, and Wegner (1954) 
and Zeaman and Wegner (1954), it 
was suggested that the CR resembles 
the UCR at the time of the UCS 
(shock) termination. In accord with 
this hypothesis a 2-second shock gave 
an accelerated heart rate CR and a 
6-second shock gave a decelerated 
CR (since the UCR at shock termina- 
tion was accelerating or decelerat- 
ing, respectively). When other shock 
values were used in a later experi- 
ment (Zeaman & Wegner, 1958) this 
hypothesis was not upheld. It was 
predicted from the hypothesis that 
no conditioning would occur for a 
very short shock (0.1 second) which 
did not allow a change in heart rate 
before its termination, or for a very 
long shock (15 seconds) which al- 
lowed the heart rate to return to 
normal by the time it was termi- 
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nated. When conditioning did occur 
these investigators revised their hy- 
pothesis to suggest that to some ex- 
tent large UCRs tend to give acceler- 
ative CRs and small UCRs decelera- 
tive CRs. 

Decelerative CR. A decelerative 
heart rate CR in human subjects is 
consistently reported by Bersh, Not- 
terman, and Schoenfeld (1953, 1956a, 
1956b, 1956c, 1957a, 1957b; Notter- 
man, Schoenfeld, & Bersh, 1952a, 
1952b, 1952c). Their procedure was 
essentially the same as that used by 
Zeaman and Wegner (1954) with 
which a decelerative CR was ob- 
tained (1-second CS, 6-second CS- 
UCS interval, and a 6-second UCS). 
Their UCS shock level, however, 
was over twice that of the Zeaman 
and Wegner studies (30-volt alter- 
nating current as contrasted with 13- 
volt alternating current). For the 
most part, the measures indicating a 
decelerative CR were taken during 
the last two heart cycles of the CS- 
UCS interval. In answer to possible 
criticism that deceleration during this 
portion of the interval was not a 
representative CR, they also meas- 
ured the first two heart cycles of the 
CS-UCS interval (Notterman et al., 
1952c) and again found a decreasing 
heart rate, which was not, however, 
statistically significant. 

Owens and Gantt (1950) report a 
decelerative CR when the petting of 
a dog served as the UCS. The UCR 
to this stimulation was also a reduc- 
tion in heart rate. Mixed results re- 
garding the form of the heart rate CR 
were obtained by Beier (1940). One 
subject showed an accelerative CR, 
another a decelerative CR, and still 
another, conditioned arrhythmia. 
The UCS used in this experiment was 
the working of a bicycle ergometer by 
the subject. l 

Accelerative CR. Other experiments 
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indicate a CR which is predominately 
accelerative in form. Skaggs (1926) 
used an auto horn CS and an induc- 
tion shock UCS, separated by 1 min- 
ute, to produce a mild increase in 
human heart rate (1.1 beats/minute). 
A greater increase in rate was ob- 
served between the ‘‘normal” condi- 
tion and the “expectancy” period 
preceding the CS (9.4 beats/minute). 

Anderson and Parmenter (1941) 
demonstrated that the CR is an in- 
crease in heart rate when a buzzer or 
metronome CS is used with a shock 
pulse UCS. They further demon- 
strated “neurosis” in their sheep sub- 
jects with a discrimination procedure 
where only one of two stimuli was 
paired with shock. Neurotic subjects 
showed a higher and more irregular 
heart rate than normals in the experi- 
mental room, and gave an increase in 
heart rate to incidental stimuli 
whereas normals did not. 

Moore and Marcuse (1945) ran 
two sows daily for 10 months using a 
tone CS and food UCS. They found 
a reliable increase in heart rate upon 
presentation of the CS, which pre- 
ceded the UCS by 1 minute. Dyk- 
man and Gantt (1951) used a tone 
CS and a shock UCS, separated by 2 
minutes, to produce an accelerative 
CR in dogs. As noted earlier, Zea- 
man and Wegner (1954), using a 2- 
second shock UCS, showed an in- 
crease in heart rate in human subjects 
with onset of the CS, 

CS-UCS Interval and Regularity. 
Church and Black (1958) using dog 
subjects also found an accelerative 
CR with a tone CS and a 3-second 
shock UCS. Their results indicate 
that CR latency is shorter for a 5- 
second CS-UCS interval than for a 
20-second CS-UCS interval. Laten- 

cies were virtually the same for the 
trace and delay conditioning pro- 
cedures. No substantial differences 


in heart rate were observed between 
the various experimental treatments. 
This last finding is to be contrasted 
with some results of Bersh, Notter- 
man, and Schoenfeld (1953) who 
found that an irregular time between 
CS and UCS produced more “anxi- 
ety” (i.e., heart rate CRs of greater 
magnitude) than a regular time be- 
tween. A condition where shock did 
not always follow the CS produced 
more anxiety than either of these 
conditions. 

Resistance to Extinction. One par- 
ticular disclosure from the Soviet car- 
diac conditioning work seems to be of 
special importance (Bykov, 1957). 
That is, the CR developed in pairing 
a neutral stimulus with a pharmaco- 
logical agent is very hard to extin- 
guish. For example, some 296 pres- 
entations of the CS alone were re- 
quired by Petrova to extinguish the 
cardiac CR. 

Gantt (Bykov, 1957) reports that 
a cardiac conditioned reflex to food 
may persist 2 years after the salivary 
and motor components have been 
extinguished. Notterman, Schoen- 
feld, and Bersh (1952c) found that 
irregular pairing of the UCS with 
the CS gave greater resistance to ex- 
tinction than regular reinforcement. 
They report further (1952a) that 
when subjects could avoid the shock 
UCS with a skeletal response, ex- 
tinction was more rapid than when 
subjects were told there would be no 
shock in extinction. Both of these 
treatments produced more rapid ex- 
tinction than the regular extinction 
procedure. In a later experiment 
(Bersh et al., 1956c) found that the 
CRs of subjects who were forcibly re- 
strained from making the skeletal 
avoidance response extinguished more 
rapidly than the free avoidance sub- 
jects. 


Generalization. Stimulus general- 
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ization of the CS has been demon- 
strated by Dykman and Gantt (1951) 
whose dog subjects differentiated be- 
tween 256, 512, and 1,024 cps tones 
with respect to heart rate, latency, 
and EKG amplitude. Bersh, Not- 
terman, and Schoenfeld (1956c) ob- 
tained generalization across tone fre- 
quencies as a function of intensity of 
the UCS. For a 28-volt alternating 
current shock UCS the human Ss 
showed a greater CR (depression of 
rate) and a flatter generalization 
across the 1,920, 1,020, 480, and 180 
cps tones than for a 20-volt alternat- 
ing current UCS. 

CR across Trials. Heart rate 
conditioning data collected by Daw- 
son (1953) are perhaps the best source 
of information for changes in the form 
of the heart rate CR across trials. 
They also illustrate sharply how de- 
ceptive a simple label such as a rate 
“increase” or ‘decrease’ is in de- 
scribing the cardiac CR. So far as 
such details are reported, most of the 
studies discussed earlier involved no 
more than 11 conditioning trials (e.g., 
Church & Black, 1958; Notterman et 
al., 1952b, 1952c; Zeaman & Wegner, 
1954, 1958). In the Dawson experi- 
ment 20 conditioning trials were used 
and the second by second forms of the 
CR and UCR are shown for each five- 
trial block. These results show that 
the early CR is, in effect, an accelera- 
tion followed by a deceleration to the 
level preceding the CS. At this stage 
of conditioning a comparison of rates 
preceding and following the CS will 
show a net increase no matter which 
point within the CS-UCS interval is 
selected. As conditioning trials con- 
tinue, however, the decelerative phase 
of the CR becomes more pronounced, 
such that rate of the heart cycles dur- 
ing this phase is less than the rate pre- 
ceding the CS. Hence, increase or 
decrease in heart rate as the CR 
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depends heavily upon the location 
within the CS-UCS interval one uses, 
as well as the trial number (or num- 
ber of trials if trials are averaged). 
We may then add these factors to 
others which affect the CR, such as 
UCS length, CS-UCS interval, and 
the kind and intensity of the UCS. 


INTERACTION WITH OTHER 
Bopy SYSTEMS 


A factor which appears in elemen- 
tary physiology texts suggests that 
the heart, as such, may not learn at 
all. This factor, obvious enough per- 
haps to be invisible, is respiration. 
Recent quantitative data clearly 
show how breathing may affect heart 
rate (Clynes, 1960; Huttenlocher & 
Westcott, 1957). Both inspiration and 
expiration produce a biphasic cardiac 
response: a brief accelerative phase 
followed by a decelerative phase of 
longer duration. This biphasic car- 
diac response is of greater magnitude 
and has a shorter latency for inspira- 
tion than for expiration (Clynes, 
1960). Furthermore, it has been 
demonstrated that in a classical con- 
ditioning situation involving buzzer 
and shock, conditioned deep inspira- 
tions occur with the onset of the CS, 
and that the cardiac CR is a brief 
acceleration followed by a more pro- 
nounced deceleration (Huttenlocher 
& Westcott, 1957). 

Regardless of which portion of the 
respiratory cycle might be correlated 
with the CS, there is the frightful 
prospect that cardiac conditioning 
work thus far has, in fact, been un- 
knowingly concerned with respira- 
tory conditioning. Or, with some 
luck, cardiac conditioning has merely 
been contaminated by the respira- 
tory variable. $ 

Fortunately, at least one cardiac 
conditioning experiment has been re- 
ported in which respiration was con- 
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trolled. Westcott (1959) instructed 
subjects to breathe shallowly in time 
with a metronome during 10 CS 
(buzzer) alone trials and 10 condi- 
tioning trials when the CS and UCS 
(shock) were paired. The cardiac re- 
sponse in this experiment was a net 
drop in rate when the CS was given 
before conditioning, and a net in- 
crease in rate after the second condi- 
tioning trial. The conditioning curve 
was negatively accelerated across 
trials, showing an increase in heart 
rate over the pre-CS rate of 3.2 beats 
per minute on the last two trials. 
Respiration records showed consist- 
ent breathing on each trial, and across 
trials, for both frequency of respira- 
tion and the I/E ratios. 

There are still other doubts about 
cardiac conditioning which we should 
consider. Kendon Smith (1954) 
argues that all conditioned visceral 
responses are in reality artifacts be- 
cause they are brought on by activa- 
tion of the skeletal musculature. Ac- 
cording to this reasoning innate 
neural connections from the skeletal 

- muscles activate the visceral systems 
with a muscular “bracing” to the 
UCS. Skeletal reactions are said to 
provide numerous afferent cues 
whereas “autonomic reactions gen- 
erate no regulatory feedback what- 
ever.” Hence it is the skeletal sys- 
tem which is conditioned and the vis- 
ceral system which merely accom- 
panies. These ideas do badly in find- 
ing support from the cardiac litera- 
ture discussing afferent pathways 
(e.g., Mitchell, 1956; Rushmer & 
Smith, 1959) for there is considerable 
anatomical evidence for autonomic 
feedback from the carotid and aortic 
bodies. 

The opposite hypothesis, that skel- 
etal responses can be mediated by 
autonomic responses is suggested by 
Wenzel (1959). Her data show an 
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increase in heart rate to tones associ- 
ated with food and a decrease to tones 
associated with shock. Whether or 
not heart rate differentiation between 
the two conditions is related to auto- 
nomic mediation of skeletal responses 
is yet to be shown in the laboratory. 

Church and Black (1958) argue a 
similar case which is in line with Pav- 
lov’s “inhibition of delay.” They too 
suggest autonomic mediation of skel- 
etal responses. The tabulated laten- 
cies in their experimental report, 
however, tend to show shorter skele- 
tal than autonomic latencies, which, 
after all, are consistent with the time 
constants of the two systems. 

Perhaps the complexity of the or- 
ganism is such as to preclude such 
simple cause and effect hypotheses 
about the various systems. 


CoNCLUSION 


That the activity of the heart will 
change significantly in amplitude and 
rate in the presence of conditional 
stimuli is clear enough. There ap- 
pear, however, to be mixed emotions 
as to the form of the heart rate CR 
since some authors report an in- 
creased rate, others a decreased rate, 
and still others either increased or de- 
creased rate, depending on such fact- 
ors as the UCR. The original ques- 
tion might then be what does the 
heart learn rather than does it learn? 
It is suggested that an answer to the 
second question may be found at two 
locations: at the desk, where heart 
rate changes would be treated as an- 
alog events rather than simple up or 
down events, and, at the laboratory, 
where like problems have previously 
been untangled with parametric 
study. 

Some tentative principles have 
been abstracted from the papers re- 
viewed: 

1. Both form of the EKG cycle 
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and heart rate may be conditioned 
with. the classical paradigm. 

2. Latency of the heart rate CR is 
less for a shorter CS-UCS interval. 

3. A CR of greater magnitude is 
produced when the UCS irregularly 
follows the CS. 

4. CR resistance to extinction is 
great when some pharmacological 
UCSs are used. Resistance to extinc- 
tion is increased with irregular CS- 
UCS pairings, and is decreased when 
UCS avoidance is made contingent 
upon a skeletal response. 

5. There is a generalization gradi- 
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ent across tone frequencies as a func- 
tion of UCS intensity. 

6. The heart rate CR changes 
across trials such that the dominant 
accelerative portion of the response 
decreases as the decelerative portion 
increases. 

Whether it is “really” the heart 
that learns, or something else such as 
the respiratory or the skeletal sys- 
tem, is perhaps a matter of degree. 
It seems unlikely that a particular 
bodily system is completely free from 
the influence of other bodily systems. 
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MATERNAL DEPRIVATION: 


TOWARD AN EMPIRICAL AND CONCEPTUAL 
RE-EVALUATION! 


LEON J. YARROW 
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The significance of early infantile 
experience for later development has 
been reiterated so frequently and so 
persistently that the general validity 
of this assertion is now almost un- 
challenged. An extensive literature 
on deviating patterns of maternal 
care, loosely labeled “maternal depri- 
vation,” adds up with an impressive 
consistency in its general conclusions: 
deviating conditions of maternal 
care in early life tend to be associated 
with later disturbances in intellectual 
and personal-social functioning. It 
has been difficult to build on this gen- 
eral premise in formulating more pre- 
cise research hypotheses relating 
specific variables of early maternal 
care to later developmental charac- 
teristics. If one attempts to order 
the empirical data from the many 
studies and the varied contexts, it 
becomes apparent that the concept of 
maternal deprivation is a rather 
muddied one. Maternal deprivation 
has been used as a broad descriptive 
term as well as an overall explanatory 
concept. As a descriptive term it 
encompasses a variety of conditions 
of infant care which are phenotypi- 


1 Based on invited address, Division on 
Developmental Psychology, sixty-seventh An- 
nual Convention of the American Psycho- 
logical Association, Cincinnati, Ohio, Septem- 
ber 1959, 

This paper was prepared in conjunction 
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Change in Mother-Figure during Infancy on 
Personality Development, conducted under 
Research Grant 3M-9077 from the National 
Institute of Mental Health, United States 
Public Health Service. 


cally as well as dynamically very 
different. In this review of the re- 
search and theoretical literature, our 
major objective is to clarify the con- 
cept of maternal deprivation by 
identifying the basic variables and 
concepts which have been indis- 
criminately combined under this 
term. 

Previous reviews have dealt pri- 
marily with the findings (Bowlby, 
1951; Glaser & Eisenberg, 1956), or 
with the methodology of a few studies 
(Pinneau, 1950, 1955). The chief 
effort of this review will be directed 
towards sorting out on an empirical 
level the varied antecedent condi- 
tions of maternal care described in 
the literature, and relating these 
empirical conditions to some major 
theoretical concepts. Through this 
kind of analysis, it is hoped to facili- 
tate the formulation of more explicit 
hypotheses on the relationship be- 
tween specific aspects of early life 
experiences and later development. 


EMPIRICAL ANALYSIS OF THE RE- 
SEARCH ON ‘‘MATERNAL 
DEPRIVATION” 


In the literature on maternal dep- 
rivation, four different kinds of devia- 
tions from a hypothetical mode of 
maternal care have been included: 
institutionalization; separation from 
a mother or mother-substitute; mul- 
tiple mothering, in which there is no 
one continuous person performing 
the major mothering functions; dis- 
tortions in the quality of mothering, 
e.g., rejection, overprotection, am- 
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bivalence. In very few studies do we 
find these “pure conditions.” Most 
often several conditions occur con- 
comitantly or sequentially in com- 
plex interaction, e.g., separation is 
followed by institutionalization, mul- 
tiple mothering occurs in an institu- 
tional setting. 

Tables 1 to 4 present the chief 
research studies organized in terms 
of the major conditions of early care: 
institutionalization, separation, mul- 
tiple mothering. Studies on distor- 
tions in the mother-child relation- 
ship, e.g., rejection, overprotection, 
ambivalence, on which there are 
many clinical reports, but few re- 
search reports, have not been in- 
cluded. The studies presented in the 
tables are grouped according to their 
general research designs: retrospec- 
tive, direct, or contemporaneous. 
The tables point out the major char- 
acteristics of the samples: the popula- 
tion from which the subjects were 
chosen, the ages at the time of study, 
and the ages at the time of the ex- 
perience. Also presented are the 
major techniques used in data col- 
lection or the kinds of data obtained. 
For the retrospective studies, the 
presence or absence of data on earlier 
conditions of maternal care is noted. 
Finally, overlapping or contaminat- 
ing conditions are noted where they 
have been reported. 

It is clear from the tables that the 
major share of studies has been on 
institutional care. There are many 
fewer published reports on separa- 
tion and multiple mothering. In the 
following section, in considering each 
of these types of studies, our focus 
will be on an analysis of the environ- 
mental conditions and the impact of 
these events and conditions on de- 
velopment. Throughout we will at- 
tempt to integrate the empirical data 
in terms of some basic psychological 
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concepts, and to point up some hy- 
potheses amenable to research. 


INSTITUTIONALIZATION 


Most of the generalizations about 
the effects of ‘‘maternal deprivation” 
are based on retrospective research in 
which institutionalization has been a 
major background condition. The 
general research designs of the many 
retrospective studies reported be- 
tween 1937 and 1955 are basically 
similar and tend to suffer from similar 
methodological deficiencies. In all 
but a few studies there is a sampling 
bias due to the method of selection of 
cases; subjects are chosen from clinic 
populations of cases under treatment 
for emotional or personality disturb- 
ances. (In delving back into the 
history of these patients, it was dis- 
covered that many had spent some 
part of their earlier life in an institu- 
tional setting.) Perhaps the most 
significant deficiency in many of these 
studies is the lack of specific data on 
early conditions of maternal care. 
The characteristics of the institu- 
tional environment are unknown or 
not described, and no data, or, at the 
best, very meager data are given 
about the circumstances associated 
with institutionalization. Such sig- 
nificant information as age at time of 
placement, duration of institutional 
care, traumatic conditions preceding 
or concomitant with institutional 
placement is rarely given. Frequently 
information about experiences fol- 
lowing institutional care is scant and 
of uncertain validity. The data on 
the personality characteristics of the 
subjects also vary greatly in depth 
and adequacy; much data are derive 
from psychiatric diagnoses based on 
an unspecified number of interviews 
or consist of case history material 
from unspecified sources; in a few 
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instances, projective or other kinds of 
personality tests have been used. 


The Institutional Environment 


In much of the research on insti- 
tutions the environment has been 
dealt with so grossly that ‘‘insti- 
tutionalization” has often referred to 
a setting as broad in many respects 
as “the home.” Only a few con- 
temporaneous studies of infants and 
young children give sufficiently de- 
tailed descriptions of the institu- 
tional setting to enable one to isolate 
discrete variables. Only one study, 
comparing the institutional and home 
environments of a small group of 
infants, makes a serious attempt to 
give an objective description of an 
institution (Rheingold, 1960). 

The institutional environments in 
the direct studies can be ordered 
in terms of several theoretically 
meaningful categories which can be 
further reduced to specific research 
variables. 

The physical environment—qualtty 
and amount of sensory stimulation. 
The importance of sensory stimula- 
tion for development has recently 
been emphasized by a number of 
animal experiments. In most of the 
research, institutional settings are 
characterized in the extreme as lack- 
ing in sensory stimulation; they are 
described as colorless and drab with 
little visual or auditory stimulation 
and with few objects for the child to 
manipulate. 

The emotional environment—affec- 
tive stimulation. For research, the 
emotional environment can be de- 
fined in a restricted sense in terms of 
formal, measurable aspects of affec- 
tive stimulation, i.e., intensity and 
variability. Institutions tend to be 
characterized by an emotional bland- 
ness and a lack of variation in feeling 
tone with the result that the infant is 
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not exposed to strongly negative or 
strongly positive affective stimula- 
tion. 

The social environment—social stim- 
ulation. The amount of mothering, 
the quality and consistency of 
mothering, and the amount and 
quality of general social stimulation 
are major aspects of the animate 
environment in terms of which insti- 
tutional care is defined. Most of the 
studies describe a low adult-child 
ratio, averaging about one adult to 
10 infants in institutional settings. 
There are usually many different 
caretakers, with the result that the 
infant has little opportunity to relate 
to one person as a consistent source 
of gratification. Compared with an 
infant in his own home, the research 
indicates that in institutions there is 
much less mothering contact, less 
total social stimulation, and less 
stability in mother-figures. 

Learning conditions. Learning 
conditions which deviate from those 
in a “normal” home environment are 
reported characteristic of institu- 
tions: deviations in opportunities for 
acquiring or practicing new skills, 
deviations in motivational condi- 
tions, and in scheduling. Often 
infants are confined to the crib or 
playpen during most of the day, with 
very limited opportunity to practice 
emerging motor skills or to make 
perceptual discriminations. | There 
tends to be little recognition by 
adults for positive achievements, 
with no or inconsistent reinforcement 
for positive learnings or socially 
desirable responses. Daily routines 
are sometimes characterized by an 
element of unpredictability, but more 
often routines are rigidly scheduled 
with little variation from day to day, 
and with little adaptation to indi- 
vidual differences. 

It is clear that institutionalization 
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is not a simple variable, and cannot 
be used as a simple research variable 
or explanatory concept. Even in the 
limited sample of institutions found 
in the direct studies, the environ- 
ments are not identical. Qualitative 
as well as quantitative variations are 
apparent among institutions in the 
amount of sensory stimulation, in the 
consistency of mothering, in the 
consistency of rewards, etc. 


Intellectual, Personality, and Social 
Characteristics Associated with Institu- 
tionalization 


Despite the methodological inade- 
quacies and the great range of ante- 
cedent conditions in the research, 
there is a core of consistency in the 
findings on the characteristics of 
children, adolescents, and adults with 
institutional backgrounds. The ma- 
jor characteristics associated with 
institutional care are: general intel- 
lectual retardation, retardation in 
language functions, and social and 
“personality” disturbances, chiefly 
disturbances centering around the 
capacity to establish and maintain 
close personal relationships. Within 
the overall consistency, however, 
there is significant variation. Not all 
children with institutional experience 
give evidence of intellectual or per- 
sonality damage, and there is a range 
in the extent of injury. These varia- 
tions can sometimes be related to the 
characteristics of the environment; 
sometimes significant modifying or 
interacting variables can be identi- 
fied. 

Intellectual defects. General intel- 
lectual retardation is commonly found 
in older children and adolescents 
with a history of institutionalization 
(Bender, 1947; Goldfarb, 1945a; Levy, 
1947; Lowrey, 1940) as well as in 
infants and young children growing 
up in institutional environments 
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(Dennis & Najarian, 1957; Fischer, 
1952, 1953; Gesell & Amatruda, 
1941; Skeels, Updegraff, Wellman, & 
Williams, 1938; Spitz, 1945, 1946). 
The data do not, however, permit 
the simple conclusion that gross 
intellectual deficiency is a necessary 
consequence of institutional experi- 
ence. The incidence and degree of 
retardation vary considerably from 
one study to another. In only some 
of the studies do some children show 
severe retardation (Dennis & 
Najarian, 1957; Gesell & Amatruda, 
1941; Goldfarb, 1945a, 1945b; Skeels 
et al., 1938; Spitz, 1945, 1946). In 
others there is only relative retarda- 
tion; they are functioning on a dull- 
normal level (DuPan & Roth, 1955; 
Fischer, 1952, 1953; Freud & Burling- 
ham, 1944; Klackenburg, 1956; 
Rheingold, 1956). Several factors 
seem to be related to the varied out- 
comes in intellectual functioning: 

1. The amount of individualized 
stimulation provided in these en- 
vironments seems to be significantly 
related to the degree of retardation. 
In the institutions in which attempts 
were made to provide individualized 
stimulation, and to foster a relation- 
ship between a single caretaker and 
infant, severe retardation was not 
found (DuPan & Roth, 1955; Fischer, 
1952, 1953; Freud & Burlingham, 
1944; Klackenburg, 1956; Rheingold, 
1956). 

2. The age of the child at the time 
of institutionalization varies greatly 
among the studies; several investi- 
gators have concluded that the 
younger the child at the time of 
institutionalization, the more likely is 
subsequent retardation (Bender, 1945, 
1947; Beres & Obers, 1950; Goldfarb, 
1947). The evidence is meager, con- 
sisting of data from two studies. In 
Goldfarb’s research in which a large 
percentage of cases showed evidence 
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of retardation, the mean age of ad- 
mission to the institution was 4.5 
months, with only three cases over 1 
year of age. Of a group of 37 adoles- 
cents and young adults studied by 
Beres and Obers (1950) only four 
were mentally retarded; all four had 
entered the institution under 6 
months. 

3. Constitutional factors. There 
are no direct data, but the findings 
that, in seemingly identical environ- 
ments, some children show retarda- 
tion and others do not, have been 
interpreted as evidence of constitu- 
tional differences in vulnerability to 
institutional deprivation. 

4. The duration of institutionaliza- 
tion. The data point to a cumulative 
impact of the institutional environ- 
ment on intellectual functioning. In 
most studies, with continued insti- 
tutional residence, infants show a 
progressive drop in developmental 
test quotients (Dennis & Najarian, 
1957; Fischer, 1952, 1953; Freud & 
Burlingham, 1944, no test data; 
Skeels, 1942; Skeels & Dye, 1939; 
Spitz, 1945, 1946). A few studies 
(DuPan & Roth, 1955; Rheingold, 
1956; Rheingold & Bayley 1959) re- 
port no significant cumulative loss in 
intellectual functioning. Although 
Dennis and Najarian (1957) found a 
decrease in Cattell test scores in insti- 
tutionalized infants between 3 and 12 
months they discovered no significant 
retardation on the Goodenough 
Draw-A-Man Test among a group of 
older children, 4.5 to 6 years of age, 
who had been in the same institution 
for several years. They raise the in- 
teresting question as to whether an 
environment which fails to offer 
adequate intellectual stimulation to 
infants is necessarily retarding for 
preschool children. 

The direct association between 
intellectual retardation and environ- 
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mental impoverishment is dramati- 
cally emphasized by Skeels and Dye's 
study (1939). Retarded institutional 
children made significant gains in 
intellectual functions after special 
environmental stimulation. In an- 
other study (Skeels et al., 1938), the 
intellectual stimulation provided by 
an experimental nursery school in an 
institution was found effective in 
preventing deterioration in intel- 
lectual functioning. Whereas a con- 
trol group showed cumulative losses 
in IQ scores, children given nursery 
school experience maintained their 
IQ level. 

Two other studies suggest that 
intellectual retardation need not be 
attributed to some elusive, unknown 
aspect of the institutional environ- 
ment, but can be directly related to 
lack of adequate stimulation. Rhein- 
gold (1943) studying infants in 
boarding homes found that children 
who shared the home with several 
other babies had significantly lower 
developmental test scores than in- 
fants who were “only children” in the 
boarding homes. Coleman and 
Provence (1957) observed retarda- 
tion similar to the institutional pat- 
tern in children living in very un- 
stimulating home environments. 

Analysis of the separate aspects of 
intellectual functioning indicates that 
all functions are not equally affected 
by institutional living. Consistent 
evidence of retardation is found in 
language, in time and space con- 
cepts, and in capacity for abstract 
conceptualization. 

Language is one function in which 
severe retardation has been found 
repeatedly in institutionalized in- 
fants and young children ( Brodbeck 
& Irwin, 1946; DuPan & Roth, 1955; 
Fischer, 1952, 1953; Freud & Burling- 
ham, 1944; Gesell & Amatruda, 
1941; Skeels et al., 1938; Rheingold & 
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Bayley, 1959) as well as in older chil- 
dren and adults with an institutional 
history (Bender, 1945, 1947; Gold- 
farb, 1945a; Haggerty, 1959; Lowery, 
1940). There is disagreement in 
the literature on _ institutionaliza- 
tion only in the age at which language 
functions first seem to be affected. 
Brodbeck and Irwin (1946) found 
evidence of retardation in institu- 
tionalized infants in the first few 
months of life, whereas Freud and 
Burlingham (1944) report no indica- 
tions of language retardation before 
12 months. Brodbeck and Irwin’s 
data were based on careful phonetic 
analysis of speech sounds, whereas 
Freud and Burlingham had no syste- 
matic language data on infants. 

With regard to the etiology of 
language retardation, Fischer (1952, 
1953) notes that in many institutions 
there is little reinforcement by adults 
of the infant’s vocalizations, and 
consequently reduced opportunity 
for the child to acquire the signal 
functions and expressive functions of 
language. Recent data on the condi- 
tioning of vocalizations in infants 
(Rheingold, Gewirtz, & Ross, 1959) 
give evidence of the role of rein- 
forcement in young infant’s vocaliza- 
tions. Early studies of language 
development (Day, 1932; Van 
Alstyne, 1929) pointed to a direct 
relationship between amount of en- 
vironmental stimulation (e.g., num- 
ber of hours the child was read to, 
“extensions of the environment”) 
and vocabulary and sentence length 
in preschool children. On the sim- 
plest level, language retardation, like 
general intellectual retardation, can 
be related to inadequate language 
stimulation. Lack of motivation for 
imitative behavior may interact with 
inadequate reinforcement of speech 
sounds in determining language re- 
tardation. 
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Serious defects in time and spatial 
concepts in older children have been 
reported in clinical descriptions by 
Goldfarb (1945a, 1949) and Bender 
(1945, 1947). Poor memory for past 
events is linked by Bender with such 
character defects as inability to bene- 
fit from past mistakes, lack of future 
goals, and weak motivation to control 
behavior for future gains. Goldfarb 
relates social maladjustment to diffi- 
culties in time and spatial concepts. 
As a result of these conceptual diffi- 
culties, disregard of school and family 
rules occurs. 

Disturbances in abstract thinking 
were also found by Bender (1947) 
and Goldfarb (1943b) in school aged 
children and in adolescents with an 
institutional background. Goldfarb 
(1945b) describes as characteristic of 
these children “an unusually defec- 
tive level of conceptualization . .. 
manifested in difficulty in organizing 
a variety of stimuli meaningfully and 
in abstracting relationships” (p. 251). 
On the Rorschach test, adolescents 
with an institutional background 
showed ‘‘an unusual adherence to a 
concrete attitude and inadequate 
conceptualization”’ (1943a, p. 222). 

Motor functions. Motor development 
seems to be less significantly affected 
than any other aspect of develop- 
ment, although there are markedly 
discrepant reports. DuPan and Roth 
(1955) and Fischer (1953) conclude 
that there is no significant retarda- 
tion in motor development during the 
first year among  institutionalized 
children, Freud and Burlingham 
(1944) report accelerated develop- 
ment during the early part of the 
second year, while Spitz (1946) notes 
marked retardation in motor func- 
tions during the first and second 
years. Differing opportunities for 
the exercise of developing motor 
functions in different institutional 
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settings may be involved (Skeels 
et al., 1938). 

Both extremes in activity level are 
found in institutionalized infants. 
Hyperactivity is sometimes noted 
(Fischer, 1952) but more common is 
a lowered activity level, associated 
with the general passivity noted as 
part of the pattern of intellectual 
retardation. There are only vague 
indications in the data of some fac- 
tors which may account for these 
different findings: constitutional dif- 
ferences among infants, the age or 
developmental level at the time of 
institutionalization, and the length 
of institutionalization. For instance, 
in the initial stages of institutionaliza- 
tion, hyperactivity is often found, 
with lowered activity level more 
common after prolonged institutional 
residence. 

Motor disturbances in the form of 
bizarre stereotyped motor patterns 
suggestive of neurological damage 
have been reported by Spitz (1946) 
in infants after a long period of insti- 
tutional residence; similar but less 
extreme motor disturbances were 
noted by Fischer (1952, 1953). In 
older children, Bender (1947) and 
Goldfarb (1943a, 1945b, 1947) found 
hyperkinetic behavior, a pattern 
considered part of a syndrome of 
impulsivity, with psychogenic rather 
than neurogenic bases. 

The findings on deviant motor 
patterns and the data on defects in 
conceptual thinking suggest the pos- 
sibility of central nervous system 
damage as a result of institutionaliza- 
tion. The evidence is not very strong, 
however, nor are there clear bases in 
these data for hypothesizing the con- 
ditions under which irreversible 
neurological damage might occur. 

Social and personality disturbances. 
Although the institutional syndrome 
has most frequently been described in 
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terms of social and personality 
turbances, in many respects the 
are less clear than are the findings 

intellectual development. Personalit 
data are based primarily on clinical © 
impressions, and the characteristics 
described are usually at the extreme 
end of thescale, reflecting exaggerated - 
pathology or a complete lack of 
capacity, rather than a relative 

deficiency. 

Interpersonal relationships. The 
major deviations reported in the 
literature are in the area of interper- 
sonal relationship. Two overtly” 
dissimilar, but dynamically related, 
types of interpersonal disturbance 
have been described: social apathy 
manifested by indifference to social 
attachments, and “affect hunger” 
characterized by incessant and in- 
satiable seeking of affection. Several 
retrospective studies report a syn- 
drome in older children and adoles- 
cents described as an inability to 
establish close, warm personal rela- 
tionships (Bender, 1947; Bender & 
Yarnell, 1941; Goldfarb, 1943a, 1945b, 
1949; Lowrey, 1940), a personality 
pattern labeled the ‘‘affectionless 
character” by Bowlby (1944), and 
one which Bender (1947) identifies as 
a psychopathic behavior disorder. 

In the contemporaneous studies of 
infants in institutions, social apathy 
is described in terms of several spe- 
cific response patterns: 

1. Inadequate social responsive- 
hess, as evidenced by a complete lack 
of social initiative, by withdrawn or 
apathetic response to social ap- 
proaches (Bakwin, 1949; Fischer, 
1952, 1953; Freud & Burlingham, 
1944), or in depressed scores on the 
social sector of developmental tests 
(DuPan & Roth, 1955; Fischer, 1952, — 
1953) 

2. An indifference to social attach- 
ments, manifested by lack of any 
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significant attachments or meaning- 
ful relationships with caretakers in 
the institution (Freud & Burlingham, 
1944; Rheingold, 1956) 

3. Inadequate social discrimina- 
tion as evidenced by failure to give 
differentiated responses to strangers 
and familiar caretakers (Freud & 
Burlingham, 1944) 

4. A lack of normal social sensi- 
tivity, indicated by inability to re- 
spond discriminatively to different 
kinds of emotional expression (Freud 
& Burlingham, 1944) 

The specificity of the relationship 
between social stimulation and social 
responsiveness in infancy is pointed 
up by Rheingold’s data (1956). In- 
fants in an institution who were given 
intensive social stimulation by one 
mother-figure, from the sixth to the 
eighth month of life, showed signifi- 
cantly greater social responsiveness 
than control subjects cared for by 
the more usual institutional routine. 
General developmental progress was 
not affected, however, by this special 
type of stimulation. In a follow-up 
of these children in adoptive homes 
at 19 months of age, Rheingold and 
Bayley (1959) found no evidence of 
any lasting impact of this special 
experience. 

The syndrome of “‘affect hunger” 
characterized by indiscriminate and 
insatiable demands for attention and 
affection is less common than social 
apathy. It is reported in several 
retrospective studies (Bender, 1945, 
1947; Goldfarb, 1945b; Lowrey, 
1940), but in only one contempo- 
raneous study (Freud & Burlingham, 
1944), in which children in an insti- 
tution are described as ‘exacting, 
demanding, apparently passionate, 
but always disappointed in new 
attachments” (p. 58). A similar, 
but less intense pattern of indis- 
criminate sociability among 6-8 
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month old infants was observed by 
Rheingold (1956). Freud and Bur- 
lingham also noted in infants an 
associated pattern of exhibitionism, 
involving indiscriminate display of 
themselves before strangers. 

Behavioral deviations considered 
symptomatic of disturbances in ego 
andsuperego development have been 
reported in older children (Bender, 
1945, 1947; Beres & Obers, 1950; 
Goldfarb, 1943a, 1949; Lowrey, 1940). 
Frequently noted is a pattern of 
diffuse and impulsive behavior sug- 
gesting a lack of normal inhibitory 
controls. In these children overt 
antisocial and aggressive behavior is 
often found. Bender and Goldfarb 
both note a lack of normal anxiety or 
guilt about aggression, a low frustra- 
tion tolerance, a lack of goal-directed- 
ness, and low achievement motiva- 
tion. Goldfarb (1943a) summarizes 
the personality pattern as impover- 
ished, meager, and undifferentiated, 
deficient in inhibition and control. 
Even as late as adolescence, the in- 
stitution children show the simple, 
unrefined, undifferentiated kind of 
behavior typical of preschool chil- 
dren. 

Beres and Obers (1950) is the one 
psychiatrically oriented study which 
raises some question as to the extent 
of personality damage resulting from 
institutionalization. They note a 
similar underlying pathology in all 
cases—a distortion in psychic struc- 
ture, an immature ego, and deficient 
superego development—but con- 
clude that by late adolescence about 
half of their 37 cases were making a 
favorable overt adjustment. They 
were 


functioning well, whether in work situation 
or at school . . . and presented no evidence of 
overt disturbance in their behavior or in their 
relationships within their families or among 
friends (p. 228). 
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This study points up the problem for 
research of making a valid distinction 
between mental health and pathol- 
ogy. These conclusions illustrate 
sharply the conflict between a defini- 
tion of mental health based on overt 
behavior and a definition derived 
from a psychodynamic assessment of 
strengths and liabilities. ba 

In looking to the direct studies for 
clues to the antecedents of personality 
deviations in older children, one is 
disappointed by the limited data on 
the personality characteristics of 
infants in institutions. The meager 
data on infants suggest some pre- 
cursors of defective ego and superego 
development such as failure to show 
imitative behavior at the appropriate 
developmental period (Freud & Bur- 
lingham, 1944; Fischer, 1953). The 
conflicting findings on autoerotic 
activity emphasize the lack of agree- 
ment as to what constitutes normal 
behavior in infancy. Freud and 
Burlingham (1944) as well as Fischer 
(1952, 1953) describe a high incidence 
of thumbsucking, rocking, head-bang- 
ing in young infants, and masturba- 
tion in older children. Spitz and Wolf 
(1949), on the other hand, found 
“practically no autoerotic activities” 
among the infants in the foundling 
home. They hypothesize that an 
emotional relationship between the 
child and a mother-figure is a pre- 
requisite for the appearance of auto- 
erotic activities. 

Few direct studies give informa- 
tion on the age at which personality 
disturbances first become evident. 
In most of this research, the youngest 
children are over 6 months at the 
time of study. Where younger chil- 
dren have been studied, frequently 
no data are given on social or per- 
sonality characteristics, Only two 
studies offer data on the age at which 
personality disturbances are first 
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noted. Freud and Burlingham (1944) 
note that infants in their institution 
did not show signs of social retarda- 
tion before 5 months. Gesell and 
Amatruda (1941) report first signs of 
“social ‘ineptness’ evident at 24 
weeks. 

The one experimental study on 
human infants (Dennis, 1941) is often 
cited as evidence that early sensory 
and social deprivation need have no 
impact on development. Dennis 
found no significant retardation in a 
pair of twins who were given “mini- 
mum” social and sensory stimulation 
during the first 7 months of life. 
Stone (1954) on the basis of a careful 
analysis of a later report (Dennis & 
Dennis, 1951) suggests that minimum 
stimulation probably represented 
minimal adequate stimulation, much 
more than that provided in many 
institutional environments. In Den- 
nis’ study the infants were handled 
for the normal routines, and there ğ 
was a consistent mother person. The 
fact that these conditions did not 
continue much beyond the first half 
year may also be significant. 

Many ad hoc theories have been 
offered to account for the intellectual 
and language retardation, the specific 
defects in abstract thinking, and the 
varied social and personality disturb- 
ances associated with institutionaliza- 
tion. The explanations which offer 
“maternal deprivation” as the basic 
etiological entity tend, on the whole, 
to be vague and generalized, and 
offer little basis for systematic re- 
search. With regard to abstract 
thought, Bender (1946) states: 


The earliest identification with the mother 
and her continuous affectional care is neces- 
sary during the period of habit training and 
the rapid development of language and the 
formation of concepts within the family unit. 
Otherwise the higher semantic and social 
development and the expansion of the educa- 
tional capacities does not take place (p. 76). 
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(Quoted by permission of Child Study Asso- 
ciation of America.) 


Regarding time concepts, she specu- 
lates, “It appears that we develop a 
concept of time in the passage of 
time in our early love relationships 
with our mother” (p. 96). Kardiner 
(1954) suggests that the sense of time 
develops in relation to the child's 
activities in looking forward to grati- 
fication. Goldfarb (1955) hypothe- 
sizes that lack of an adult identifica- 
tion model (in institutions) inhibits 
the development of functions such as 
language, which are dependent on 
social forms of imitation and com- 
munication. Impairment in abstract 
thinking is interpreted (Goldfarb, 
1955) in terms of Stern's theory 
(1938) which postulates that the 
develdpment of conceptual thinking 
is dependent on the growth of a sense 
of continuity of the self. According 
to Stern, the grasp of identity, as 
well as judgments of equality, simi- 
larity, and difference are all derived 
from the sense of continuity of self. 
At first these judgments are related 
to concrete personal events; even- 
tually, they are separated from them 
and become abstract. Without conti- 
nuity of mothering in an institution, 
Goldfarb contends the normal de- 
velopment of the self-concept is im- 
paired, with resulting defects in 
abstract thought processes. Social 
and personality disturbances are 
linked directly to lack of opportunity 
for close human relationships in in- 
fancy in institutional environments. 
Goldfarb attributes defective ego and 
superego development to inadequate 
opportunity for the child to identify 
with parental figures and to internal- 
ize the parental image. Bender (1946) 
describes the etiology of personality 
disturbances in similar terms: 

There is a primary defect in ability to identify 
in their relationships with other people - . - 
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due to the fact that they never experienced a 
continuous identification during the infantile 
period from the early weeks through the 
period when language and social concepts of 
right and wrong are normally built up and 
when psychosexual and personality develop- 


ment are proceeding (p. 76). (Quoted by per- 
mission of Child Study Association of Amer- 


ica). 


She hypothesizes that anxiety and 
guilt arise in reaction to “threats to 
object relationship or identification 
processes” (p. 76). Lack of anxiety 
and inability to feel guilt are related 
to the lack of capacity to identify or 
form object relationships. 

Analysis of environmental variables 
in the research literature points to 
some more discrete factors than 
maternal deprivation in the institu- 
tional setting. This elusive variable, 

rnal deprivation, can be analyzed 
in terms of variables more amenable 
to research, e.g., amount and quality 
of tactile, auditory, or visual stimu- 
lation; reinforcement schedules; etc. 
Harlow’s (1958) research on infant 
primates has demonstrated the effi- 
cacy for research of analyzing mother- 
ing in terms of simple stimulus con- 
ditions, such as contact stimulation. 
The discrepancies in the findings of 
the research on institutionalization 
suggest the need to consider inter- 
acting variables, such as constitu- 
tional differences in vulnerability, 
varying sensitivities at different de- 
velopmental stages, etc., in formu- 
lating hypotheses for more critical 
research testing. 


MATERNAL SEPARATION 


Maternal separation has never 
been studied under pure conditions. 
Most often separation has been as- 
sociated with other traumatic events 
such as illness and hospitalization or 
operative procedures, and often with 
parental rejection or death or dis- 
ability of a aha Frequently sepa- 
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ration from the parents has been 
followed by institutional placement 
with the result that the impact of 
institutional safluences is superim- 
posed on the loss of parental figures. 
In the literature on separation, the 
role of such contaminating variables 
has not been distinguished from the 
effects of a break in continuity of 
relationship with the mother. Spitz 
and Wolf's (1946) is the only study 
in which the physical environment 
remained unchanged following sepa- 
ration; it is one of the few studies in 
which the quality of the mother-child 
relationship prior to separation had 
been studied. 

Most of the research is contempo- 
raneous, reporting on the reactions of 
children at the time of separation. 
The long-term effects are almost un- 
known. Follow-up data more than a 
year later are given in a few studies 
(Bowlby, Ainsworth, Boston, & 
Rosenbluth, 1956; Lewis, 1954; Spitz, 
1954a, 1954b; Spitz & Wolf, 1946), 
but in these studies there are many 
contaminating conditions, e.g., se- 
verely disturbed parental relation- 
ships, repeated separations, intermit- 
tent institutionalization. 


Immediate Reactions to Separation 


Despite the many different condi- 
tions associated with the separation 
experience, there is some degree of 
consistency in the findings reported 
on immediate and short-term reac- 
tions of infants and preschool children 
to separation. In each of the studies 
some children develop apparently 
severe reactions, and the behavior 
sequences in these extreme cases ap- 
pear to be dynamically similar 
(Bowlby, 1953b; Robertson & Bowl- 
by, 1952; Roundinesco, David, & 
Nicolas, 1952; Spitz & Wolf, 1946). 
The characteristic sequence of re- 
sponses begins with active protest 
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and violent emotional reactions, such 
as intense and prolonged crying and 
active reaching out to people, in 
apparent attempts to bring back the 
mother or to find a substitute. In 
time this behavior is followed by ac- 
tive rejection of adults, and finally 
by apathy and withdrawal of interest 
in people, accompanied by a decrease 
in general activity level. Robertson 
and Bowlby characterize this latter 
phase as “mourning’’; Spitz and 
Wolf label it “‘anaclitic depression." 
Feeding disturbances—refusal of 
food, sometimes pathological appe- 
tite—and regression in motor and 
other functions are also feported. 
When the mother is not restored, 
Spitz found symptoms of progressive 
deterioration in infants, a complete 
withdrawal from social interaction, a 
sharp drop in developmental level on 
infant tests, and extreme physical 
debilitation, with loss of weight and 
increased susceptibility to infections. 
In older children (over 12 months) 
marked physical and intellectual 
deterioration have not been reported, 
but severe disturbances in interper- 
sonal relationships have been noted 
(Bowlby, 1953b; Robertson & Bowlby, 
1952). The ‘mourning phase” in 
infants and young children is followed 
by behavior described as a ‘‘denial of 
the need for his own mother,” which 
Robertson and Bowlby interpret as 
an indication of a repression of the 
mother image. The child shows no 
apparent recognition of his own 
mother, but may transfer his attach- 
ment to a substitute mother. (There 
has been some controversy as to 
whether such behavior can be inter- 
preted as evidence of repression or 
whether it should be considered more 
simply as a denial mechanism— 
Bowlby, 1953a; Heinicke, 1956.) If 
no substitute mother is available, the 
child may show  promiscuously 
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friendly behavior, using adults in an 
instrumental way, but without es- 
tablishing meaningful attachments. 
Such behavior Bowlby considers in- 
dicative of a repression of all need for 
mothering, the prelude to a psycho- 
pathic character development. If, 
however, the child is reunited with 
his mother before the need for 
mothering is completely repressed 
(after some unstated critical time 
interval) the behavior pattern is 
believed to be reversible. The child 
is able on return to his mother to re- 
establish a relationship with her, al- 
though there may be several months 
of difficult adjustment, with irrita- 
bility, impulsive expression of feel- 
ings, and an exaggeratedly intense 
attachment. 

These descriptions of the reactions 
of young children to conditions in- 
volving loss of a mother-figure have 
provided the basis for most of the 
generalizations about the severe ef- 
fects of maternal separation. The 
dramatic character of these changes 
has overshadowed the significant 
fact that a substantial portion of the 
children in each study did not show 
severe reactions to separation. In 
Spitz’s study of 123 infants sepa- 
rated from their mothers between 6 
and 8 months of age, severe reactions 
occurred in only 19 cases. Although 
in Robertson and Bowlby’s (1952) 
research on 45 children ranging in 
age from 4 months to 4 years, all but 
three are reported to have shown 
some reaction; the intensity and 
duration of the reactions are not 
clearly specified. Less than half, 20 
cases, are reported as showing “acute 
fretting,” a behavior pattern which 
is not well-defined. The reported 
duration of the reaction varied from 
1 to 17 days. There are no data on 
the number of children who showed 
prolonged reactions. 
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In a careful study of the reactions 
to hospitalization of 76 infants under 
1 year of age (ranging from 3 to 51 
weeks) Schaffer (1958) found that 
reactions varied with age. Infants 
over 7 months of age showed overt 
social and emotional reactions, such 
as excessive crying, fear of strangers, 
clinging and overdependence on the 
mother. Infants under 7 months 
evidenced more global disturbances, 
i.e., somatic upsets, blank facial ex- 
pression, extreme preoccupation with 
the environment. Schaffer relates 
the global disturbances to sensory 
deprivation, whereas the social dis- 
turbance at the later age, an age at 
which more differentiated relation- 
ship with the mother exists, are 
interpreted as reactions to separation 
from the mother. 

Heinicke’s research (1956) points 
to less severe effects of simpler, less 
complicated separation situations. 
He found no extreme behavioral 
disturbances in two groups of chil- 
dren, 15 to 30 months of age, with 
different separation experiences, one 
group in a residential nursery, the 
other in a day nursery. The children 
in the residential nursery did show 
more overt and more intense aggres- 
sion, greater frequency of autoerotic 
activities, and more frequent lapses 
in sphincter control. These findings 
are interpreted as indicating an 1M- 
balance between the child’s impulses 
and his power to control and organize 
these impulses in relation to the 
external world. 


Long-Term Effects of Separation 


Conclusions about the long-term 
effects of separation are very tenuous. 
They are based on a few studies in 
which the information about the 
early history is not well-documented. f 

In an earlier study of 44 juvenile 
thieves, Bowlby (1944) concluded 
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that separation experiences in child- 
hood resulted in a character disorder 
distinguished by a “lack of affection 
or feeling for anyone.” The conclu- 
sions are based on clinical findings 
that 12 out of 14 cases diagnosed as 
“affectionless characters” had been 
separated from their mothers in in- 
fancy or early childhood. Some of 
these children had been hospitalized 
for illness without any contact with 
their mothers over a long period of 
time, others had experienced frequent 
changes in foster mothers, and some 
had been institutionalized for long 
periods during infancy. 

In a follow-up study of 60 children 
between 6 and 13 years of age, who 
had been in a sanitarium for tubercu- 
losis for varying periods of time be- 
fore their fourth birthday, Bowlby 
et al. (1956) found less serious long- 
term effects than in the earlier stud- 
ies. No statistically significant differ- 
ence in intelligence was found be- 
tween the control and the sanitarium 
group. In personality characteristics, 
the sanitarium children were judged 
as showing tendencies towards with- 
drawal and apathy, as well as greater 
aggressiveness. On the basis of the 
psychiatric social worker's interview 
with the parents, 63% of the children 
were rated as maladjusted, 13% were 
considered well-adjusted, and 21% 
adjusted but with minor problems. 
Bowlby et al. conclude that ‘‘out- 
come is immensely varied, and of 
those who are damaged, only a small 
minority develop those vary serious 
disabilities of personality which first 
drew attention to the pathogenic 
nature of the experience” (p. 240). 
They suggest that the potentially 
damaging effects of separation should 
not be minimized, but concede that 

_ “some of the workers who first drew 
attention to the dangers of maternal 
deprivation resulting from separation 
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have tended on occasion to overstate 
their case" (p. 242). 

The findings of Lewis (1954) are 
sometimes cited as evidence that 
early separation need not necessarily 
have lasting harmful effects. Among 
a group of 500 children who were 
studied in a reception center shortly 
after being separated from their 
parents, only 19 showed “morbid lack 
of affective responsiveness” (p. 41). 
Follow-up data were obtained on 240 
of these children, 2-3.5 years later. 
Only 100 had a personal follow-up by 
a psychiatric social worker and a 
psychiatrist; information on the 
others was obtained through letters 
from social workers who had some 
contact with the children. Of the 100 
more intensively studied children, 
only three were diagnosed as having 
marked personality disorders, 22 
were having some difficulties in rela- 
tionships, and 36 were showing mild 
neurotic symptoms or mild delinquent 
behavior. With reference to the 
timing of separation, Lewis con- 
cludes that “separation from the 
mother before the age of five years 
was a prognostically adverse feature” 
(p. 122). Apparently this is a clini- 
cally based conclusion, since the data 
presented in the tables show no 
significant differences between the 
children separated before 5 years of 
age and those separated after 5. 

Data from several studies indicate 
that the impact of separation is 
modified by the character of the 
mother-child relationship preceding 
the separation experience and the 
adequacy of the substitute mothering 
following separation. Spitz and 
Wolf (1946) noted that the infants 
who did not develop severe depres- 
sive reactions were those separated 
from “poor mothers,” and conclude 
that the better the mother-child 
relationship preceding separation, the 
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TABLE 4 
RESEARCH ON MULTIPLE MOTHERING 


Data on early 


Age at time of 


Investi- 


pscatscidy experiences 


experience 


Techniques 


Subjects 


gators 


General description of 
environment 


9-11 years 


Birth to time of study 


Rorschach 


38 children from kibbutz and 34 controls 
from neighboring villages 


(1957) 


Rabin 


General description of 


Birth to time of study 9-17 months 


Rorschach 


24 infants and 40 children in kibbutz 


(1958a) 20 control infants and 40 control children 


Rabin 
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environment 


9-11 years 


Goodenough Draw-A-Man 


Vineland Social Maturity 
Griffiths Infant Scale 


more severe the immediate reactions. 
Lewis (1954), on the other hand, 
found a higher proportion of children 
who had been separated from nor- 
mally affectionate mothers in ‘‘good”’ 
or “fair” condition than those who 
had not received “adequate” affec- 
tion. It might be hypothesized that a 
close relationship with a mother- 
figure preceding separation will be 
followed by more severe immediate 
reaction but will be ultimately more 
favorable than a poor antecedent 
relationship. | Children who have 
experienced a close relationship in 
infancy may be better prepared to 
form new attachments in later life 
than children without any experience 
of close relationships. 

The-amount, the quality, and the 
consistency of substitute mothering 
will presumably influence the inten- 
sity of immediate reactions as well as 
the long-term personality conse- 
quences. Spitz and Wolf (1946) con- 
cluded that infants who were pro- 
vided with a satisfactory substitute 
mother did not develop the depres- 
sive syndrome. (There were no inde- 
pendent criteria of the adequacy of 
substitute mothering. The substitute 
relationship was considered satisfac- 
tory in those cases which did not 
develop depressive symptoms.) Rob- 
ertson and Bowlby (1952) also note 
that where an adequate substitute 
mother was provided, there was not a 
complete withdrawal from social 
contact. 


MULTIPLE MOTHERING 


Serious personality difficulties in 
later life have been postulated as a 
consequence of multiple mothering 
in infancy and early childhood. 
There has been little research, and in 
most of the clinical observations 
multiple mothering has been associat- 
ed with impersonal or rejecting mater- 
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nal care. The underlying assumption 
in much of the literature is that in- 
adequate maternal care is a necessary 
concomitant of situations in which 
there is more than one mother-figure. 
Multiple mothering has never been 
very precisely defined. In its most 
general sense, it refers to an environ- 
mental setting in which a number of 
different persons perform the ma- 
ternal functions for the child, with 
varying degrees of adequacy and with 
varying degrees of consistency. From 
the child’s viewpoint, it may mean 
that there is no single person to whom 
he can relate as a major source of 
gratification and on whom his de- 
pendency needs can be focused. In 
some situations the biological mother 
may share the mothering functions 
with other chosen women; in other 
circumstances no biological tie exists 
between the child and the several 
mothers. Some current studies in 
home management houses, a few 
reports on the Israeli kibbutzim, and 
a very few anthropological reports 
provide all the available data on the 
effects of multiple mothering. 

In the anthropological accounts of 
multiple mothering in different cul- 
tural contexts (DuBois, 1944; Eggan, 
1945; Mead, 1935; Roscoe, 1953) 
there are variations in the number of 
people who share mothering functions 
as well as variations in the role of the 
natural mother. In cultures in which 
the extended family is the traditional 
pattern, the mothering functions 
may be shared by the mother, grand- 
mother, aunts, and other female rela- 
tives of the child; in some groups, 
male relatives may take over some 
maternal functions. The biological 
mother may be clearly identified as 
the central, most significant person 
in some cultures; in others she may 
be assigned a very secondary role. 

In Western cultures, grandmothers 
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frequently assume some of the 
mothering functions, and in some 
social groups, child nurses play an 
important role. In the pre-Civil War 
Southern plantation class group, 
many mothering functions were taken 
over by the Negro nurse. The line of 
demarcation between supplemental 
maternal care and multiple mother- 
ing has never been very clear. 

In none of these situations are 
disturbances in infant functioning 
associated with multiple mothering 
practices, nor are later personality 
characteristics or deviations attrib- 
uted to this aspect of early maternal 
care. 

The Israeli kibbutzim provide an 
unique set of conditions of multiple 
mothering. In this setting, there are 
two mother-figures, the natural 
mother and the metapelet, the chil- 
dren’s caretaker, each of whom has 
very distinctive functions. The 
major share of the daily routine care 
as well as major training functions, 
such as toileting and impulse control, 
are assumed by the caretaker in the 
communal nurseries. The mother’s 
contacts with the child tend to be 
limited to scheduled periods during 
the day, which are free periods and 
do not involve traditional family 
routines. The mother seems to 
function solely as an agent to provide 
affectional gratification, although ob- 
viously the extent of the mother’s 
influence, as well as the specific areas 
of influence on the child’s develop- 
ment, will vary with her concept of 
her role and with her personality 
characteristics. 

There are several impressionistic 
reports (Golan, 1958; Irvine, 1952; 
Rapaport, 1958) and a few syste- 
matic studies (Rabin, 1957, 1958a) of 
the development of infants and chil- 
dren in the Israeli kibbutzim. Rabin 
(1958a), using the Griffiths Infant 
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Developmental Scale, found slight 
developmental retardation in infants 
between 9 and 17 months of age living 
in a communal nursery. In only one 
sector of development—the personal- 
social area—were these infants sig- 
nificantly retarded. Rabin attributes 
this retardation to less individual 
stimulation in the kibbutzim as com- 
pared to a normal home environment. 
This study represents the only re- 
ported research in a setting in which 
there may be deprivation in the 
amount of stimulation without con- 
comitant lack of affectional inter- 
change with the mother. 

In an attempt to assess the long- 
term effects of living under these 
special conditions of maternal care in 
the kibbutz, Rabin (1958a) studied a 
group of children, between 9 and 11 
years of age, who had lived in this 
environment from infancy. He found 
no evidences of retardation (using 
the Goodenough Draw-A-Man Test), 
nor were there any indications of 
personality distortions. On the con- 
trary, Rorschach data are interpreted 
as indicating that the children from 
the communal settlements showed 
“better emotional control and greater 
overall maturity.” In ego-strength 
(using Beck’s index) they were judged 
superior to the control group of chil- 
dren living with their parents. Rabin 
interprets these findings as evidence 
of the important role of later experi- 
ences in personality development. 

In another study, Rabin (1958b) 
compared the psychosexual develop- 
ment of 10-year-old kibbutzim reared 
boys with boys from patriarchal type 
families. Using the Blacky test, he 
found significant differences, con- 
sistent with theoretical expectations. 
The kibbutz boys showed less 
“oedipal intensity,’ more diffuse 
positive identification with their 
fathers, and less intense sibling 
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rivalry. This study also points up 
the fact that multiple mothering is 
only one of the significant factors 
which differentiate the kibbutz from 
the “normal” family setting. As in 
the case with other conditions asso- 
ciated with maternal deprivation 
the kibbutz is atypical in regard to 
the absence of the father. 

Home management houses provide 
a setting in which multiple mothering 
occurs without associated depriva- 
tion of social stimulation. These 
houses are set up in university home 
economics departments to provide 
practical experience in child care for 
the students. The infant is separated 
from his foster mother or removed 
from a familiar institutional environ- 
ment and placed in the home manage- 
ment house for a period of several 
weeks to several months. He is cared 
for by a number of young women, 
each of whom assumes primary re- 
sponsibility for mothering activities 
for a limited period of time, usually 
about one week. There is one con- 
tinuous figure—the instructor in the 
house—with whom the infant can 
maintain a relationship; she assumes 
some of the ordinary child care func- 
tions. In the course of his residence 
in the home management house, the 
infant may have 15 to 20 different 
“mothers.” In this setting he re- 
ceives much attention and stimula- 
tion from many different ‘‘mother- 
figures.” Following his residence 1n 
the home management house, the 
infant is usually placed in a foster or 
adoptive home. The follow-up stud- 
ies and the several direct studies of 
children in home management houses 
(Gardner, Pease, & Hawkes, 1959; 
Gardner & Swiger, 1958) are in agree- 
ment in finding no evidence of intel- 
lectual retardation and no gross per- 
sonality disturbances. The long-term 
effects have not yet been evaluated. 
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These three settings—the home 
management house, the kibbutz, and 
the extended family—are comparable 
in only one respect; the mothering 
functions are distributed among sev- 
eral different persons. They differ 
in regard to the continuity of the 
mother-figure, in the role played by 
the substitute mothers, and in the 
amount of social stimulation given to 
infant. In some situations, because 
of the high adult-child ratio, it is 
likely that the infant will receive 
more sensory as well as more social 
stimulation than the child in an 
average family home. For infants, 
the kibbutz may be similar to an 
institutional setting in terms of the 
amount of individual social stimula- 
tion provided. It is clear that none 
of these conditions necessarily in- 
volves severe deprivation of mother- 
ing, but the mothering experience of 
children in these settings may differ 
significantly from that of children in 
homes with one mother-figure. 

None of these studies provides a 
crucial test of the prevalent hypoth- 
esis that multiple mothering results 
in a diffusion of the mother-image. 
This theory, developed in the context 
of institutional care, holds that the 
child who is cared for by a number of 
different persons cannot develop a 
focused image of one significant 
mother-person in infancy, and con- 
sequently, will have difficulties in 
relationships in later life. On the 
whole, the few relevant pieces of 
research suggest that multiple 
mothering per se is not necessarily 
damaging to the child. 


DISTORTIONS IN THE MOTHER- 
CHILD RELATIONSHIP 


Although distortions in the mother- 
child relationship have frequently 
been included in the concept of ma- 
ternal deprivation, in this report we 
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shall not attempt any comprehensive 
review of this vast clinical literature. 
Institutionalization, separation, and 
multiple mothering represent devia- 
tions from a cultural norm of 
“mothering” primarily on the di- 
mension of amount or consistency of 
contact with the mother. Under the 
category of distortions in the mother- 
child relationship are subsumed all 
the deviations in maternal relation- 
ships which usually have as their 
antecedents disturbances in the char- 
acter or personality of the mother. 
These disturbances in maternal re- 
lationships are manifested in overtly 
or covertly hostile or rejecting be- 
havior, sometimes more subtly in 
overprotective behavior, and often in 
unpredictable swings from affection 
to rejection or in ambivalent behav- 
ior. As distinguished from a lack of 
social stimulation, a lack of respon- 
siveness, and the lack of a mother- 
figure, this type of deviation in 
maternal care tends to be character- 
ized by either very strong emotional 
stimulation, or by stimulation with a 
preponderance of negative affect. In 
contrast to institutional care, there 
may even be very intense intellectual 
stimulation. 

The literature on distorted ma- 
ternal relationships suggests a some- 
what different kind of personality 
outcome from the psychopathic or 
affectionless character. The person- 
ality distortions tend to be in the 
schizophrenic, depressive, and neu- 
rotic categories. Again there may be 
rather specific antecedent conditions 
and organismic vulnerabilities asso- 
ciated with these types of personality 
deviations (Spitz, 1951). A critical 
review pointed towards a clarification 
of the variables and an analysis of 
the many ad hoc theories concerning 
distorted mother-child relationships 
is very much needed. 
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SOME THEORETICAL ISSUES AND 
RESEARCH IMPLICATIONS 


The data from the research on 
institutionalization, maternal separa- 
tion, and multiple mothering have 
relevance for a number of funda- 
mental issues in developmental 
theory: questions concerning the 
kinds of environmental conditions 
which facilitate, inhibit, or distort 
normal developmental progress; the 
conditions which influence the re- 
versibility of effects of events in in- 
fancy and early childhood; and the 
extent to which the timing of an 
experience, i.e., the developmental 
stage at which it occurs, determines 
its specific impact. 

In theories of the effects of early 
infantile experiences on later develop- 
ment, two concepts have been promi- 
nent: deprivation and stress. Al- 
though all the intricacies of the 
mother-child relationship cannot be 
conceptualized adequately in terms 
of these concepts, some of the en- 
vironmental conditions and events 
found in the research on maternal 
deprivation can be ordered meaning- 
fully in these terms. Deprivation is a 
key concept in the analysis of institu- 
tional environments. Many of the 
circumstances associated with ma- 
ternal separation and multiple 
mothering can be ordered in terms of 
the concept of stress. 


Deprivation 


In institutional settings several 
types of deprivation, each with po- 
tentially different developmental im- 
plications, can be distinguished: 
sensory deprivation, social depriva- 
tion, and emotional deprivation. In 
many settings all three types of 
deprivation occur and are complexly 
interrelated, but they do not neces- 
sarily vary concomitantly, and they 
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can be independently manipulated in 
research. 

The studies on sensory deprivation 
in animals indicate that complete 
restriction of perceptual experience 
in early life results in permanent im- 
pairment in the functions in which 
deprivation occurs. In the most 
extreme institutional environments 
the degree of sensory deprivation is 
less severe than in the animal studies. 
Nevertheless, developmental retarda- 
tion is found, with the extent of 
retardation corresponding to the 
degree of sensory deprivation. 

Social deprivation probably acts 
in a similar way as deprivation of 
sensory stimulation, leading to dis- 
turbances in social functioning, such 
as, social apathy and social hyperre- 
sponsiveness. The simplest hypoth- 
esis relates social apathy to inade- 
quate social stimulation during a 
developmental period which is critical 
for the acquisition of social respon- 
siveness. If social deprivation occurs 
after appropriate social responses 
have been learned, affect hunger or 
intensified seeking of social response 
may occur. Although social depriva- 
tion is less amenable to experimental 
manipulation than is sensory depriva- 
tion, in natural situations, some sim- 
ple indices can be used, such as the 
number of persons with whom the 
infant has contact during a 24-hour 
period, the amount of time during 
which he receives stimulation. 

Emotional deprivation has been 
used popularly and in clinical writings 
as a catchall term to include depriva- 
tion of social, sensory, and affectional 
stimulation. For research, a more 
precise usage in terms of deprivation 
of affective stimulation may be use- 
ful. The term, emotional depriva- 
tion, can be restricted to characterize 
an environment with neutral feeling 
tone or without variation in feeling 
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tone, an environment similar in some 
respects to the monotonous, bland 
environment described under sen- 
sory deprivation. Emotional apathy, 
withdrawn behavior, lack of differ- 
entiation of affect, and insensitivity 
to feelings or emotional nuances in 
others are characteristics which 
might be related to early emotional 
deprivation. Within this concept of 
emotional deprivation, simple ob- 
jective measures are also possible, 
e.g., ratings of intensity of positive 
or negative affect, amount of time 
during a 24-hour period in which 
different types and intensities of 
affective stimulation are provided. 

In addition to independent manip- 
ulation of each of these types of 
stimulation—sensory, social, and 
emotional—in more focused research 
there might be systematic variation 
in several dimensions of stimulation: 
quality of stimulation, e.g., monoto- 
nous, varied; intensity; frequency; 
regularity; cumulative duration of 
deprivation; sensory modalities in 
which deprivation occurs. 


Stress Consequent to Change 


Critical research on maternal sepa- 
ration requires a distinction between 
the event of separation and later 
conditions often associated with sepa- 
ration which may be similar to those 
described under deprivation. The 
event of separation is associated 
with significant changes in the physi- 
cal, and social environments, changes 
which may be stressful for the young 
child. In the physical environment, 
the changes involve the disappear- 
ance of familiar objects, sounds, 
smells, and tactile stimuli; in the 
social environment, there may be 
changes in the amount and quality of 
social stimulation. The new environ- 
ment may provide more tactile 
stimulation and less verbal stimula- 
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tion. There may be modifications in 
the speed as well as kind of response 
to the child, e.g., the new caretaker 
may ignore the child's crying, or she 
may reward it by tactile stimulation 
rather than by oral gratification. 
For the infant or young child, these 
changes result in a loss of environ- 
mental predictability. The degree of 
stress experienced is likely to vary 
with the degree of unpredictability. 

Change and novelty as stress in- 
ducing agents can be studied through 
research designs providing for careful 
measurement or systematic variation 
in the physical and human environ- 
ments, i.e., the degree of carryover of 
familiar objects from the old to the 
new environment, the degree of 
similarity between the old and new 
caretakers in physical and psycho- 
logical characteristics, variations 
among the old and new mothers in 
the modalities in which stimulation is 
given. The impact of change in the 
physical environment might be evalu- 
ated by holding constant the human 
environment while systematically 
varying the physical environment, 
and conversely, the human environ- 
ment might be varied, with the 
physical environment constant. The 
amount of change necessary to pro- 
duce a discriminable difference to the 
child may vary with developmental 
factors. The significance of a change 
in the human environment will almost 
certainly depend on whether a mean- 
ingful relationship has developed 
with the mother-figure. If separation 
occurs after this point, the stress of 
change is reinforced by the loss ofa 
significant person. 

In the research on multiple mother- 
ing the one consistent characteristic 
of the varied contexts of multiple 
mothering is environmental unpre- 
dictability associated with changing 
agents of gratification. Unpredicta- 
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bility may be based on differences in 
technique among the different 
mother-figures, on variations in speed 
of response to the child’s expression 
of needs, on inconsistency in the 
kinds of behavior which are rewarded, 
punished, or ignored. Unlike separa- 
tion conditions in which new predic- 
table patterns may soon be estab- 
lished, in multiple mothering unpre- 
dictability remains the most charac- 
teristic aspect of the environment. 

There is not strong research evi- 
dence nor very firm theoretical 
grounds to support the assumption 
that the presence of several concur- 
rent mother-figures in early life results 
in a diffusion of the mother-image 
and later inability to establish mean- 
ingful relationships. The variable 
conditions of reinforcement which 
characterize some multiple mothering 
situations provide a special kind of 
learning situation which may lead to 
the development of atypical patterns 
of relationships, but not necessarily 
shallow ones. It is likely that the 
presence of several mother-figures 
will vary in significance at different 
developmental periods. The lack of a 
consistent role model is probably 
more serious during the early pre- 
school period than in early infancy. 
In further research, attempts should 
be made to vary systematically the 
degree of stress associated with en- 
vironmental unpredictability, while 
controlling other variables such as 
degree of role differentiation among 
the multiple mothers. 

Although deprivation and trauma 
can be treated as independent con- 
cepts, there are conditions under 
which deprivation can be considered 
a traumatic stimulus. It is recog- 
nized that trauma may result from 
excessive stimulation, but the condi- 
tions under which inadequate stimu- 
lation may be traumatic are more 


obscure. Recent research indicates 
that extreme sensory deprivation 
may be stressful for adults (Wexler, 
Mendelson, Leiderman, & Solomon, 
1959). We might assume that dep- 
rivation becomes a traumatic stimu- 
lus after the appropriate motiva- 
tional conditions have developed. 
Thus Hebb (1955) suggests: 


The observed results seem to mean, not that 
the stimulus of another attentive organism 
(the mother) is necessary from the first, but 
that it may become necessary only as psycho- 
logical dependence on the mother develops 
(p. 828). 


Research Implications 


Analysis of the research on institu- 
tionalization, separation, and mul- 
tiple mothering highlights some theo- 
retically significant questions and 
points to some specific variables 
which can be experimentally manipu- 
lated or controlled through the op- 
portunistic utilization of natural situ- 
ations. 

Duration of deprivation or stress. In 
much of the research, the subjects 
have experienced a cumulative series 
of deprivations or stressful experi- 
ences, beginning in infancy and con- 
tinuing through childhood. Few 
studies give specific data on the 
length of time the child has been 
exposed to these conditions. Gold- 
farb (1945b, 1947, 1955), Bender 
(1945, 1947), and Bowlby (1944) 
conclude from retrospective studies 
that the longer the period of institu- 
tional care, the more severe the ulti- 
mate damage. These conclusions are 
based largely on individual case 
findings. Those cases which did not 
show the same irreversible patterns 
as the rest of the population had been 
in institutions for a shorter period of 
time. Spitz and Wolf (1946) suggest 
that there may be a critical time 
interval after which the effects of 
maternal separation are irreversible. 
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If the infant is reunited with his 
mother within 3 months, the process 
of physical, social, and intellectual 
deterioration may be arrested, but if 
the mother-child relationship is not 
restored within 5 months, irreparable 
damage occurs. There are no com- 
parable data on children beyond 
infancy. One might hypothesize that 
the critical time interval might be 
longer with older children. 

Research on older children attest 
to the damaging effects of repeated 
separations (Bowlby, 1944; Lewis, 
1954). On the whole, no distinction 
has been made among several differ- 
ent separation experiences: a single 
instance of separation with reunion, 
a single separation without reunion, 
repeated small doses of separation 
with consistent reunion with the 
same mother, and cumulative sepa- 
rations with repeated changes in 
mothers. It can be assumed that 
each of these experiences provides 
different learning conditions for the 
development of meaningful relation- 
ships. The most extreme outcome, 
the ‘‘affectionless character,” may be 
the result of the most extreme condi- 
tions, i.e., repeated traumatic sepa- 
rations. 

Time or developmental siage at 
which deprivation or stress occurs. 
Psychoanalytic theories regarding 
the significance of early experience 
for later development have often 
been interpreted as postulating that 
the younger the organism, the more 
severe and fixed the effects of an 
environmental impact. Only limited 
data are available on human subjects. 
Ribble (1943) tends to interpret her 
data on maternal rejection as sup- 
porting this point of view. Bender's 
and Goldfarb’s (1947) retrospective 
studies suggest that the younger the 
child, the more damaging the effects 
of deprivation and stress. Some ani- 
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mal research supports this hypothe- 
sis; other studies do not (Beach & 
Jaynes, 1954; King, 1958). 

The findings on institutionalized 
infants that intellectual retardation 
is not apparent before 3 months of 
age and that personality disturbances 
are not evident before 5 or 6 months 
suggest that this type of deprivation 
has no significant impact in the early 
weeks of infancy. (Because of the 
known unreliability of infant tests, 
and the lack of sensitive measures of 
personality and intellectual functions 
in early infancy, some degree of 
caution is necessary in interpreting 
these findings.) 

A more refined hypothesis regard- 
ing the significance of the timing of 
experiences is the critical phase hy- 
pothesis which holds that there are 
points in the developmental cycle 
during which the organism may be 
particularly sensitive to certain kinds 
of events or most vulnerable to spe- 
cific types of deprivation or stress. 
Several animal studies (Moltz, 1960; 
Scott, Fredericson, & Fuller, 1951; 
Tinbergen, 1954) support the general 
outlines of the critical phase hy- 
pothesis. From the assorted data on 
the intellectual functioning of institu- 
tionalized children a testable hy- 
pothesis emerges regarding a critical 
period for institutional deprivation: 
vulnerability to intellectual damage 
is greatest during the 3-12 month 
period. Beres and Obers (1950) sug- 
gest that institutional deprivation 
will differ in its impact at different 
developmental periods. The data on 
which this conclusion is based are 
limited. Of their four cases showing 
mental retardation, all were ad- 
mitted to the institution under 6 
months of age; the four cases de- 
veloping schizophrenia entered the 
institution at a later age (specific 
age not reported). 
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Although the general consensus in 
the literature is that maternal sepa- 
ration which occurs before the child 
is 5 years of age is likely to be most 
damaging, the findings are not suffi- 
ciently clear to pinpoint any one age 
as being most vulnerable. Bowlby 
(1944) notes among the affectionless 
thieves: 
in practically all these cases, the separation 
which appears to have been pathogenic oc- 
curred after the age of six months, and in a 
majority after twelve months. This suggests 
that there is a lower age limit, before which 
separations, whilst perhaps having undesir- 
able effects, do not produce the particular re- 
sults we are concerned with here—the affec- 
tionless and delinquent character (p. 41). 
(Quoted by permission of the International 
Journal of Psycho-Analysis.) 


On the basis of our knowledge of 
the developmental characteristics of 
children, one might postulate differ- 
ing vulnerabilities at different periods 
of development. The developmental 
level of the child is likely to influence 
the significance of deprivation or the 
meaning of a separation experience 
for him. With regard to separation, 
the period during which the child is 
in the process of consolidating a rela- 
tionship with his mother may be an 
especially vulnerable one. Also sig- 
nificant may be the developmental 
stage with regard to memory func- 
tions. After the point in development 
at which the child can sustain an 
image of the mother in her absence 
and can anticipate her return, the 
meaning of a brief separation may be 
less severe than at an earlier develop- 
mental period. The degree of auton- 
omy the child has achieved may also 
affect the extent of trauma experi- 
enced. The loss of the mother may 
represent a greater threat to the 
completely dependent infant than to 
the young child who has achieved 
some locomotion and some manipu- 
latory control over his environment. 


The advent of language which sym- 
bolizes even a greater degree of en- 
vironmental mastery may mitigate 
further the severity of trauma. 

Similarly, the effects of institu- 
tional deprivation may be more severe 
for the young infant who is com- 
pletely dependent on outside sources 
of stimulation than for the older 
child who is capable of seeking out 
stimulation. There may also be age 
linked effects of different types of 
deprivation. Some animal studies 
suggest that a minimal level of stimu- 
lation may be necessary to produce 
the biochemical changes necessary 
for the development of the underlying 
structures. Deprivation in certain 
sensory modalities may be more 
significant at one age than at another. 
For example, deprivation of tactile 
stimulation may be most significant 
during the first weeks of infancy, 
whereas auditory or visual depriva- 
tion may become more significant 
later. Social deprivation may be 
most damaging during the earliest 
period of the development of social 
responsiveness. 

Constitutional factors. Although 
the role of constitutional factors in 
influencing the long-term effects of 
early trauma has been increasingly 
stressed, the meager data in support 
of the significance of constitutional 
factors have been indirect. Several 
retrospective studies have found simi- 
lar deprivation experiences in the 
history of individuals who in later 
life made satisfactory life adjustments 
as in those who made poor adjust- 
ments. The different outcomes are 
accounted for in terms of constitu- 
tional factors. In considering the role 
of constitutional factors a distinction 
might be made between organismic 
differences in general vulnerability to 
deprivation or stress and vulnera- 
bilities in specific sensory modalities. 
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Data from a number of studies attest 
to individual differences in sensitivi- 
ties in specific modalities. With re- 
gard to research design, it may be 
important, too, to distinguish be- 
tween organismic differences which 
are constitutionally determined and 
differences in vulnerability which 
vary with developmental stage. 
While organismic sensitivities can- 
not be manipulated experimentally, 
it may be possible to study constitu- 
tional factors by developing research 
designs in which subjects with known 
differences in sensitivities are sub- 
jected to the same experimental 
conditions. 


Tue Lonc-TERM EFFECTS: THE 
IssuE OF REVERSIBILITY 


It does not seem fruitful to state 
the question of reversibility in terms 
of an either-or hypothesis, i.e., 
whether or not early experiences 
produce irreversible effects. Rather 
the question might be: what are the 
conditions under which an earlier 
traumatic or depriving experience is 
likely to produce irreversible effects? 
The concept of irreversibility implies 
that an adverse experience results in 
permanent structural changes in the 
nervous system such that at some 
later developmental period a given 
response sequence is either facilitated 
or inhibited. A further implication is 
that subsequent experience plays no 
role in changing response potentiali- 
ties or in developing responses which 
are incompatible with earlier estab- 
lished behavior patterns. Several 
studies suggest that permanent dam- 
age to the central nervous system 
may result from early sensory depri- 
vation. Increasingly the research 
points to the resiliency of the organ- 
ism. Beres and Obers’ is one of the 
few investigations from the psycho- 
analytic orientation which makes a 


strong case for the modifiability of 
the effects of earlier infantile experi- 
ence. They cite in support a conclu- 
sion by Hartmann, Kris, and Lowen- 
stein (1946) that 

the basic structure of the personality and the 
basic functional interrelationship of the sys- 
tems of the ego and superego are fixed to some 
extent by the age of six, but after this age, the 
child does not stop growing and developing, 
and growth and development modify existing 
structure (p. 34). (Quoted by permission of 
International Universities Press, Inc.) 


Many factors in complex interac- 
tion undoubtedly determine the ex- 
tent to which recovery is possible 
from early intellectual or personality 
damage. More pointed research is 
needed to identify the specific condi- 
tions under which irreversible dam- 
age to the central nervous system 
occurs. Also needed are specific 
research designs on reversibility, 
designs aimed at reversing intellec- 
tual or personality damage. 


TOWARD A CONCEPT OF MATERNAL 
DEPRIVATION 


In focusing on the isolation of 
simple variables for formulating test- 
able hypotheses on the relationship 
between early environmental condi- 
tions and later development, we have 
avoided complex concepts centering 
around the emotional interchange 
between mother and infant, concepts 
which have been focal in psycho- 
dynamic theories. The mother as a 
social stimulus provides sensory stim- 
ulation to the infant through tactile, 
visual, and auditory media, i.e., 
through handling, cuddling, talking 
and playing with the child, as well as 
by simply being visually present. 
The mother also acts as a mediator of 
environmental stimuli, bringing the 
infant in contact with the environ- 
ment and buffering or heightening 
the intensity of stimuli. The mean- 
ing of these mothering activities to 
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the child and the impact of the 
mother’s absence varies with the 
child’s perceptual, cognitive, and 
motor capacities at different develop- 
mental levels. On the simplest level, 
if the mother is not present, the in- 
fant may be deprived of tactile, 
auditory, and visual stimuli from a 
social source, as well as of the en- 
vironmental stimuli which the mother 
ordinarily makes available to him. 
At this point, the mother’s absence 
may be experienced by the young 
infant only as a deprivation of distinc- 
tive stimuli offered by a social being. 
The impact on the infant may be 
more severe if the mother’s absence 
is accompanied by deviations in 
need-gratification sequences, such as, 
failure to have needs anticipated or 
long delay before gratification is 
provided, by marked inconsistencies 
in patterns of gratification, or inade- 
quate gratification. The significance 
of these kinds of frustration experi- 
ences will be modified by the length of 
time during which they operate, the 
developmental level of the child, e.g., 
the degree of autonomy he has 
achieved. 

The usefulness of this reduction of 
maternal deprivation has been dem- 
onstrated in ordering the reported 
research findings and in suggesting 
more refined hypotheses for further 
research. It is likely, however, that 
not all aspects of the mother-child 
relationship can be meaningfully re- 
duced to such simple variables. We 
can only speculate on the process 
through which the mother comes to 
acquire special meaning to the child. 
We assume that the mother-image 
gradually evolves as a distinctive 
perceptual entity out of a welter of 
tactile, visual, auditory, and kines- 
thetic cues. (There has been some 
speculation, without definitive data, 
that¥in early infancy before these 
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sensory cues are organized into a 
percept of an object existing outside 
of himself, the infant may still “‘recog- 
nize” the mother as an assortment of 
familiar stimuli.) In time through 
repeated contact these cues become 
“familiar” or distinctive to the in- 
fant, and finally there is a fixation of 
positive feelings on this perceptual 
complex. After the point of fixation 
of positive feelings on the mother, 
new elements enter into the child’s 
reactions to a loss or a change in 
mothers. At this point, sensory 
deprivation and environmental 
change may be secondary, the loss of 
a significant person becomes of pri- 
mary significance. This experience 
cannot occur until the infant reaches 
a developmental point at which he is 
able to conceptualize the existence of 
an “object’’ outside of himself. As a 
matter of conceptual clarity, it might 
be desirable to limit the concept of 
maternal deprivation to the condi- 
tions associated with the loss of a 
specific, cathected person, a person 
who has acquired distinctive signifi- 
cance for the child, one on whom 
positive feelings have been fixated. 


CONCLUSIONS 


The wide range of circumstances 
included under the concept of ma- 
ternal deprivation stand out when 
the research is carefully scrutinized. 
Included are studies of children who 
have been separated from their par- 
ents and placed in institutional set- 
tings, other studies deal with children 
who have been grossly maltreated or 
rejected by their families, others are 
concerned with children temporarily 
separated from their parents because 
of illness, and in others the maternal 
functions are assumed by several 
different persons. These experiences 
have occurred at different develop- 
mental stages in the children’s life 
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histories, and there has been consider- 
able variation in the length of expo- 
sure to these conditions, and in the 
circumstances preceding and follow- 
ing the deviating conditions. 

It is apparent that the data on 
maternal deprivation are based on 
research of varying degrees of 
methodological rigor. Most of the 
data consist of descriptive clinical 
findings arrived at fortuitously rather 
than through planned research, and 
frequently the findings are based on 
retrospective analyses which have 
been narrowly directed toward verifi- 
cation of clinical hunches. 

The areas of knowledge and the 
areas of uncertainty become more 
sharply delimited when we break 
down the complex concept of ma- 
ternal deprivation into some discrete 
variables. For instance, in the studies 
on institutional care in which sensory 
deprivation emerges as a major vari- 
able, we can conclude that severe 
sensory deprivation before one year 
of age, if it continues for a sufficiently 
long period of time, is likely to be 
associated with severe intellectual 
damage. Direct observation of chil- 
dren undergoing the experience of 
maternal separation shows a variety 
of immediate disturbances in be- 
havior, permitting the simple conclu- 
sion that this is a stressful experience 
for children. There is no clear evi- 
dence that multiple mothering, with- 
out associated deprivation or stress, 
results in personality damage. 


487 


With regard to the long-term 
effects of early deprivation or stress 
associated with institutionalization 
or maternal separation, no simple 
conclusions can be drawn. In the 
retrospective studies, significant in- 
teracting variables are usually un- 
known. Longitudinal studies cur- 
rently underway may offer data on 
the reinforcing or attenuating influ- 
ence of later experiences. We might 
hope for more pointed longitudinal 
studies on questions of ‘reversibility, 
such as, studies of human or animal 
subjects who have been subjected to 
experimental deprivation or trauma, 
or longitudinal studies of special 
populations chosen because of some 
known deviation from a cultural 
norm of mothering, e.g., infants who 
have experienced separation for adop- 
tion (Yarrow, 1955, 1956) and infants 
in multiple mothering situations 
(Pease & Gardner, 1958). 

The analysis of the literature points 
up the need for more definitive re- 
search on the role of many “nonma- 
ternal” variables, variables relating 
to the characteristics of environ- 
mental stimulation and variables 
dealing with organismic sensitivities. 
After clarification of the influence of 
such variables, then perhaps syste- 
matic research can come to grips 
with some of the more elusive aspects 
of the emotional interchange in the 
intimate dyadic relationship of 
mother and infant. 
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In the history of the psychology of 
perception few matters have been of 
more continuous interest than the 
relationship between perceived size 
and perceived distance. It is our 
objective to examine the current 
status of this question by reviewing 
the recent literature. With some 
exceptions our review will be con- 
fined to investigations which have 
been reported since 1952. Several 
surveys of the literature prior to 1952 
are available, and for this reason we 
will have relatively little to say about 
these earlier investigations (reviews 
can be found in Boring, 1942, Ch. 8; 
Vernon, 1954, Ch. 5; Woodworth & 
Schlosberg, 1954, Ch. 16).? 

Most studies of this question have 
converged upon a single proposition 
which aptly has been called the Size- 
Distance Invariance Hypothesis. The 
invariance hypothesis is often stated 
in the following terms: “A retinal 
projection or visual angle of given 
size determines a unique ratio of ap- 
parent size to apparent distance” 
(Kilpatrick & Ittelson, 1953, p. 224). 
This proposition has been applied 
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179-187; Dember, 1960, pp. 169-192). The 
same can be said about the presentations con- 
tained in the recently published opthalmo- 
logical textbooks. Two illustrative discus- 
sions can be found in Bedrossian (1958, pp. 
109-115) and Adler (1959, pp. 762-780). 


repeatedly in explanations of per- 
ceived size and distance in general, 
and in accounts of size constancy in 
particular. 

Two variations of this fundamental 
proposition also have been asserted 
frequently. The first may be called 
the Known Size-Apparent Distance 
Hypothesis, and it can be derived 
directly from the more general propo- 
sition stated above. It may be ex- 
pressed as follows: an object of 
known physical size uniquely deter- 
mines the relation of the subtended 
visual angle to apparent distance. 
This hypothesis is the basis for many 
explanations of size as a cue for ap- 
parent distance. 

The second variation is often 
called Emmert’s Law, and in this 
form has been employed in investiga- 
tions of the size of the afterimage and 
its relationship to the distance of the 
projection surface. Woodworth and 
Schlosberg have stated the relation- 
ship in this way: “the judged size of 
the image is proportional to the 
distance” (1954, p. 486). A more 
general statement can be formulated 
also: the apparent size of an object 
will be proportional to distance when 
retinal size is constant. In this form 
the close relationship between this 
proposition and the broader Size- 
Distance Invariance Hypothesis is 
obvious. We have given the proposi- 
tion independent status because it 
has been applied mainly to questions 
concerned with the perceived size of 
the afterimage. 

For clarity of exposition we have 
elected to review each of these propo- 
sitions separately. However, the 
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reader will discover that on several 
occasions we have violated these 
self-imposed boundaries. In the 
closing section of this paper we shall 
present some conclusions about the 
size-distance relationship in general. 


THE S1zE-DIsTANCE INVARIANCE 
HYPOTHESIS 


This hypothesis proposes an in- 
variant relationship between per- 
ceived size and distance such that 
the apparent size of an object is 
uniquely determined by an interac- 
tion of visual angle and apparent 
distance. 

Support for the invariance hy- 
pothesis comes from studies which 
show that the size of an unfamiliar 
object can be judged accurately only 
if cues to the distance of the object 
are available. The prototypal experi- 
ment was performed by Holway and 
Boring (1941), who obtained size 
matches under four sets of conditions 
which represented a successive elimi- 
nation of distance cues. Size matches 
approximated constancy under con- 
ditions of binocular viewing and 
gradually approached the law of 
visual angle as distance cues were 
eliminated. However, perfect visual 
angle matches were not obtained 
even under the condition of greatest 
reduction. This was attributed to a 
“light haze” visible within the reduc- 
tion tunnel due to light reflections in 


3 Various areas of relevant research have 
been omitted from this paper. Investigations 
dealing with the relationship between ex- 
posure time and perceived size (e.g., Allen, 
1953; Comalli, 1951; Gulick & Stake, 1957; 
Howarth, 1951; Leibowitz, Chinetti, & 
Sidowsky, 1956) and the effects of relative 
visual direction on perceived size and distance 
(e.g., Gogel, 1954, 1956a, 1956b) have not 
been reviewed. We have also excluded refer- 
ence to the developmental studies of size and 
distance. These investigations have been re- 
viewed recently by Wohlwill (1960). 


the corridor. When this cue was 
eliminated, perfect visual angle 
matches were obtained (Lichten & 
Lurie, 1950). These findings have 
been confirmed in more recent in- 
vestigations which utilized a variety 
of stimulus objects and a variety of 
techniques for eliminating distance 
cues (e.g., Chalmers, 1952, 1953; 
Hastorf & Way, 1952; Renshaw, 
1953; Zeigler & Leibowitz, 1957). 

The results referred to above are 
usually interpreted as a straightfor- 
ward demonstration of the depend- 
ence of perceived size on perceived 
distance. However, we wish to point 
out that the introduction of the 
visual angle matches as evidence for 
the size-distance hypothesis involves 
at least one of the following two 
assumptions: (a) under conditions of 
complete reduction apparent distance 
tends toward zero, (b) under condi- 
tions of complete reduction apparent 
distance assumes some value other 
than zero which is the same for both 
the standard and the variable stimu- 
lus. 

The first assumption is untenable 
in its original form since the value 
“zero” distance is meaningless in the 
experimental contexts described ear- 
lier. Perhaps, then, “zero distance” 
might be interpreted to mean in- 
determinate distance, i.e., distance 
which is not regulated by specifiable 
cues. Still, as Woodworth and 
Schlosberg note, “we just do not 
perceive free-floating objects at un- 
specified distances” (1954, p. 481). 
Instead, the object will be localized 
at some specific distance. According 
to the invariance hypothesis, the 
apparent distance for any given ob- 
server (O), whatever it is, should 
interact with the visual angle to de- 
termine apparent size. However, 
since the reduced situation is am- 
biguous it is likely that apparent 
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distance will vary for different Os. 
Under these conditions, the invari- 
ance hypothesis would predict cor- 
responding variations in the size 
matches. This prediction, of course 
is quite different from the consistent 
visual angle matches obtained by 
Holway and Boring, etc. For these 
reasons the first assumption stated 
in terms of “zero” distance or “‘inde- 
terminate” distance is not very con- 
vincing to us. 

The assumption of  equidis- 
tance seems more plausible. Carlson 
(1960a) and Wallach and McKenna 
(1960), addressing themselves to 
different aspects of the size-distance 
problem, have advanced the second 
assumption. Thus, Wallach and 
McKenna write that “the equation of 
image-sizes results from an implicit 
assumption of equal distance of the 
standard and the comparison object” 
(1960, p. 460). Carlson (1960a, p. 14) 
cites Gogel’s (1956b) experiments as 
evidence for a tendency to see objects 
as equidistant under the conditions 
of the reduction experiment. 

It is plain that a bias toward equi- 
distance would explain the obtained 
visual angle matches. Unfortunately, 
there is little empirical basis for the 
contention that this tendency actu- 
ally was operative. The experimental 
evidence for the equidistance tend- 
ency (Judd, 1898; Gogel, 1956b) was 
obtained when all of the objects in 
question were viewed simultane- 
ously. In the classic Holway-Boring 
investigation the standard and com- 
parison were viewed successively. 
Secondly, all of Gogel’s experiments 
dealt with instances in which a 
monocularly viewed object was local- 
ized at the same distance as a 
binocularly viewed object. Gogel 
presented no evidence that the same 
equidistance tendency is present when 
all objects were viewed monocularly. 
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However, the Holway-Boring results 
were obtained when both standard 
and comparison were viewed monocu- 
larly. Finally, it should be noted 
“that the strength of the tendency 
for objects to appear equidistant 
decreases as the lateral line-of-sight 
separation of the objects is increased” 
(Gogel, 1956b, p. 16). This fact 
makes it highly unlikely that the 
equidistance tendency was effective 
in the Holway-Boring type of experi- 
ment. 

This analysis leads us to conclude 
that the applicability of the visual 
angle data as evidence for the in- 
variance hypothesis involves assump- 
tions whose validity has never been 
demonstrated. What is needed is a 
systematic experimental investiga- 
tion of apparent distance under vary- 
ing degrees of reduction including 
complete elimination of distance cues. 
In the absence of such information 
the consonance of visual angle 
matches with the invariance hypoth- 
esis is at best conjectural. 

The frequent appeals to the in- 
variance hypothesis in explanations 
of perceived size have endowed this 
proposition with almost axiomatic 
status. Nonetheless, evidence has 
been accumulating which casts doubt 
on the generality of this hypothesis. 
In what follows we shall describe a 
series of investigations whose out- 
comes have not been consonant with 
the invariance hypothesis. 


Overestimation in Size Judgments 


A frequently confirmed finding is 
size overestimation which increases 
with distance. As the physical dis- 
tance of the object is increased, the 
physical size of the object is progres- 
sively overestimated. While over- 
estimation is certainly surprising, it 
need not necessarily be inconsistent 
with the invariance hypothesis. If it 
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should also turn out that apparent 
distance increases more rapidly than 
physical distance, then the results 
demonstrating increasing overestima- 
tion of size could be reconciled easily. 

Let us first consider those studies 
which report instances of overestima- 
tion of size which increases with dis- 
tance. Unless otherwise noted, the 
results to be described below were 
obtained with binocular vision and 
an objective matching attitude, i.e., 
O was instructed to match the stand- 
ard and comparison so that they 
would have the same physical size. 
Holway and Boring (1941) found 
that when O was allowed normal 
binocular vision, the apparent size of 
a disk of light increased more rapidly 
with increasing physical distance 
than did physical size. This finding 
was explained as a ‘‘space error” re- 
sulting from the fact that the variable 
stimulus was always to the left of the 
standard. More recent experiments 
rule out this explanation. In an out- 
door setting, Gibson (1947, 1950) had 
O match the size of a distant stake 
with the size of one of a set of nearer 
stakes, which stood both to the right 
and to the left of the more distant 
stake. Overestimation of the size of 
the distant stake increased as its 
distance increased from approxi- 
mately 80 feet to 675 feet. The in- 
crease of estimated size with distance 
was greatest between 80 and 320 
feet. 

More recent experiments confirm 
Gibson’s findings. Gilinsky (1955a) 
investigated size perception of ob- 
jects presented out-of-doors at dis- 
tances ranging from 100 to 4,000 
feet. Size matches made under an 
“objective’’ set were greater than the 
physical size of the standards and 
increased with increasing distance of 
the standard. The acceleration of 
estimated size with distance was 
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greatest between 100 and 400 feet, 
Using somewhat shorter distances 
and three-dimensional stimulus ob- 
jects in an outdoor setting, Smith 
(1953) also demonstrated that ap- 
parent size increases with distance. 
Under Distance Condition N, the 
comparison was placed at 2 feet and 
the standard at 16, 80, or 320 feet. 
Under Condition R, the comparison 
was placed at the remote distances 
and the standard nearby. As the dis- 
tance of the standard was increased 
(Condition N) the size of the com- 
parison had to be made progressively 
larger than the physical size of the 
standard in order to achieve apparent 
equality. As the distance of the 
comparison was increased (Condition 
R) it had to be made increasingly 
smaller in order to match the stand- 
ard. At distances beyond 200 feet a 
comparison which was smaller than 
the physical size of the standard was 
required to produce apparent equal- 
ity. 

Increasing overestimation of size 
at distances of 20 feet and less has 
been demonstrated by Jenkin (1957, 
1959). In his first experiment, Jenkin 
(1957) found that when the com- 
parison was at 2 feet, it had to be 
made significantly larger than when 
it was at 10 feet in order to match 
the standard at 20 feet; i.e., apparent 
size increased significantly over the 
short distance interval from 2 to 10 
feet. Since the average match at the 
near position exceeded the physical 
size of the standard, and at the far 
position was exceedingly close to the 
physical size of the standard, over- 
estimation of size is indicated. This 
overestimation cannot be attributed 
to a space error because size judg- 
ments made with the variable at the 
same distance as the standard were 
not significantly different from the 
true size of the standard, while the 


| 


SIZE-DISTANCE HYPOTHESES 


difference between true and judged 
size was highly significant when the 
standard was at 20 feet and the 
variable at 2. 

In order to study more fully the 
relationship between small incre- 
ments of distance and estimates of 
objective size, Jenkin (1959) per- 
formed a second and a third experi- 
ment, in which he presented com- 
parison stimuli at distances inter- 
mediate between those employed in 
his earlier study. In the second ex- 
periment, the comparison was located 
at a distance of 20, 40, or 160 inches. 
In the third experiment, a fourth 
distance (80 inches) was inserted 
between the 40- and 160-inch posi- 
tions. For all distances, mean size 
matches exceeded the physical size of 
the standard stimulus and became 
significantly larger as the comparison 
object was placed closer to O; i.e., 
overestimation of size increased with 
distance. The use of a third and 
fourth comparison distance made it 
possible to plot the results graphi- 
cally. When plotted against the 
logarithms of the distances, the mean 
size matches gave points fitted by a 
straight line. According to Jenkin 
(1959), this straight line relationship 
“suggests the existence of some 
hitherto undiscovered law relating 
apparent size and short increments 
in distance” (p. 348). 

In his first experiment Jenkin used 
natural lighting. Coules (1955) has 
demonstrated that a brighter object 
farther away appears to be at the 
same distance as a nearer but dimmer 
object (see also Ittelson, 1952). If 
the more distant stimulus objects in 
Jenkin’s experiments received rela- 
tively less illumination than the 
nearer objects, then progressive dis- 
tance overestimation might have re- 
sulted. This in turn would account 
for the progressive overestimation of 
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size which was obtained. In order to 
control for differences in illumination 
in his second experiment, Jenkin 
(1959) varied the illumination of the 
standard stimulus between 11 foot- 
candles and 26.5 foot-candles, while 
keeping the illumination of the com- 
parison constant at 11 foot-candles. 
Differences in illumination of the 
standard had no significant effect 
either on amount of overestimation 
of size or on the rate at which it in- 
creased with distance. 

In order to determine whether in- 
creasing overestimation of size would 
occur with a familiar stimulus object, 
Jenkin (1959) permitted O to ex- 
amine the standard at a distance of 
24 inches for 5 seconds before making 
size matches with the standard at its 
usual distance of 320 inches. In- 
creased familiarity with the standard 
reduced the amount by which it was 
overestimated but did not affect the 
rate at which overestimation in- 
créased with distance. 

In a further experiment Jenkin 
(1959) tested the possibility that 
decreasing size matches are related to 
decreasing ratios of distance between 
standard and comparison objects. 
This was accomplished by placing the 
standard 80 inch in front of O instead 
of 320 inches as formerly. If the 
distance ratio is crucial, then a 
steady decrement in the size match 
should be obtained from 20 to 80 
inches, and an increment in the size 
match should be observed at 160 
inches. The data of the third experi- 
ment did not confirm this expecta- 
tion. The size matches decreased as 
the comparison receded from 80 to 
160 inches. 

From the experimental evidence 
which we have summarized, it ap- 
pears that increasing overestimation 
of size is well-established. The in- 
variance hypothesis demands that 
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increasing overestimation of size be 
accompanied by a tendency for ap- 
parent distance to increase more 
rapidly than physical distance. At 
least one experiment indicates that 
apparent distance does increase in 
this way: Tada (1956) performed a 
bisection experiment in which second- 
ary cues to distance were eliminated. 
Using binocular vision, O made bi- 
sections by stopping one of two light 
spots when it appeared to be halfway 
between O and the second spot, which 
was fixed at a point designating the 
total distance to be bisected. In a 
second experiment, O’s task was to 
bisect a 2- or 4-meter interval, pre- 
sented at various distances from O, 
with each of its end points marked by 
a bright spot. In both experiments, 
Tada found that the phenomenal 
midpoint was farther than the ob- 
jective one. In other words, the 
farther half of the distance was over- 
estimated as compared with the 
nearer half. 

Tada’s findings are given some sup- 
port by Purdy and Gibson (1955). 
They found that when O was per- 
mitted full primary and secondary 
cues to distance, errors in dividing 
distances (up to 300 yards) into 
halves and thirds tended most fre- 
quently to involve making the nearer 
segment too large in comparison with 
the farther. However, few errors were 
made; in general, perceived magni- 
tudes of distance corresponded well 
with physical magnitudes of distance. 
Consistent findings of a large ac- 
celeration of apparent size with dis- 
tance would seem to demand a rea- 
sonably large and consistent tendency 
to overestimate the farther distance 
as compared with the nearer, 

The invariance hypothesis is fur- 
ther weakened by the fact that at 
least two experiments on distance 
estimation give results exactly op- 
posite to those of Tada (1956). 


Gilinsky (1951) has presented evi- 
dence which indicates that perceived 
distance increases with true distance 
at a diminishing rate. The experi- 
menter (E) moved a pointer at a slow 
and nearly constant rate along the 
ground. away from O, who instructed 
E to mark off successive increments 
of equal perceived length. In the case 
of one O, every increment of apparent 
distance represented an attempt to 
match a memorized “subjective foot 
rule”; in the case of the other O, a 
memorized ‘subjective meter stick” 
was being matched. For both Os 
apparent distance increased more 
slowly than physical distance. This 
experiment is defective because error 
in bisection experiments is related to 
the direction of motion of the pointer; 
as the pointer withdraws, O tends to 
make the farther segment too large 
in comparison with the nearer (Purdy 
& Gibson, 1955). This defect was 
avoided in a second experiment by 
Gilinsky (1951). Across a large, flat 
lawn, a line was stretched perpendicu- 
lar to the frontal, parallel plane of O. 
O was required to bisect each one of 
14 distances, ranging from 8 to 200 
feet, by stopping a pointer, which 
moved back and forth along the line, 
at a point which appeared to be half- 
way between the near end of the line 
and a marker indicating the total 
distance to be bisected. The results 
were the same as in the previous 
experiment. 

Smith (1958) also found that far 
distances tend to be underestimated 
in comparison with near ones. As a 
standard stimulus he used a white 
sheet of oilcloth, which was spread 
out on the floor of a hall. The variable 
stimulus was a strip of the same oil- 
cloth rolled from a small roller. To 
match the length of the standard, the 
variable was made 15.1% longer than 
the standard. 

The invariance hypothesis must be 
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abandoned if we accept both the 
finding of apparent distance which 
increases less rapidly than physical 
distance and the finding of increasing 
overestimation of size. A way out of 
this difficulty is suggested by Carlson 
(1960b), who maintains that increas- 
ing overestimation of size is an arti- 
fact of “objective” instructions. 
When O is trying to judge actual, 
physical size, his size matches will be 
influenced by his beliefs about size- 
distance relationships. The major 
attitude by which O will be influ- 
enced is the concept of perspective— 
the notion that apparent size becomes 
smaller as distance increases. “From 
O's point of view, a near object must 
‘look’ larger than a far object for the 
two to be equal in physical size” 
(Carlson, 1960b, p. 200). Hence O 
will make size matches which appear 
to indicate an overestimation of the 
far object. 

Given several discriminably different dis- 
tances in the same setting, amount of over- 
estimation may be a fairly precise function of 
distance, but only because trials at different 
distances are not really independent, and O 
can judge the distances relative to each other 
(p. 201). 


In support of his thesis, Carlson 
(1960b) pointed out that overestima- 
tion does not occur in experiments, 
such as those of Brunswik (1956, pp. 
67-69) and Singer (1952), in which O 
is asked to base his size judgments 
upon a naive, natural impression of 
size (‘‘look” instructions). Carlson 
(1960b) performed an experiment in 
which O was allowed, but not required 
to differentiate apparent visual size 
from objective size. Using free binoc- 
ular regard, O adjusted a 10-foot 
distant variable triangle to match a 
40-foot distant standard. O was also 
required to bisect the distance to the 
apparatus on which the standard tri- 
angle had been presented. Under 
apparent size instructions, size 
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matches were accurate. Under ob- 
jective size instructions, overestima- 
tion of size occurred. We are told 
that under both “look” and “objec- 
tive" instructions, “the half-distance 
of the standard was.. . overesti- 
mated” (p. 206). Apparently this 
means that O, in bisecting the dis- 
tance to the standard’s apparatus, 
required the marker to be placed too 
close to himself. If so, the results 
indicate that apparent distance in- 
creases less rapidly than physical 
distance. 

It is doubtful that Carlson has re- 
moved the difficulties facing the in- 
variance hypothesis. Carlson (1960b) 
used only one pair of distances; if 
either the standard or the variable 
had been placed at more than one 
distance, he might have found that 
estimated size increases with dis- 
tance under “‘look"’ instructions, even 
though overestimation does not oc- 
cur. The published data of Brunswik 
(1956, pp. 67-69) and of Singer (1952) 
do not provide an answer to this 
question. Furthermore, Carlson's 
finding that size is accurately esti- 
mated does not match his finding 
that apparent distance increases less 
rapidly than physical distance. 

Instead of linearly increasing over- 
estimation of size, some investigators 
have reported a curvilinear relation- 
ship between physical distance and 
overestimation of size. Hastorf and 
Way (1952) found that when dis- 
tance cues are available, overestima- 
tion of size increases from 10 to 20 
feet and decreases from 20 to 30 feet. 
Chalmers (1952) found that overes- 
timation increased from 10 to 20 feet 
and decreased from 20 to 50 feet 
when O viewed the 10-foot compari- 
son binocularly. 

It should be noted that even if the 
reported instances of progressive 
overestimation of size should be ac- 
counted for by progressive overesti- 
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mation of distance this would still 
leave unexplained the curvilinear 
size distance relationship obtained by 
several investigators. 


Nonmatching Judgments of Size and 
Distance 


According to the invariance hy- 
pothesis, the perceived size of an 
object is proportional to its perceived 
distance, when its retinal image size 
is held constant. This requirement of 
proportionality is frequently not met 
when size and distance judgments 
are both made in the same experi- 
mental setting. For purposes of expo- 
sition, we may divide the experiments 
which produce nonproportional re- 
sults into two classes: (a) In the first 
class are included those experiments 
which provide evidence for a ‘‘size- 
distance paradox’’—a consistent tend- 
ency either to couple an underestima- 
tion of the relative size of an object 
with an overestimation of its relative 
distance or to couple an overestima- 
tion of the relative size of an object 
with an underestimation of its rela- 
tive distance. (b) In the second class 
are those experiments which show 
that a variable having a consistent 
influence on size judgments has no 
consistent influence on distance judg- 
ments, and, correlatively, those ex- 
periments which show that a variable 
having a consistent influence on 
distance judgments has no consistent 
influence on size judgments. 

Class 1: The Size-Distance Paradox. 
A striking example of the size-dis- 
tance paradox is the moon illusion. 
As is well known, the moon appears 
larger on the horizon than at the 
zenith. According to the invariance 
hypothesis, it should also look farther 
away. Yet O usually reports that the 
moon looks closer when it is low in 
the sky. The most recent discussion 
of this time honored problem is by 
Kaufman and Rock (1960). 


More detailed evidence for* the 
size-distance paradox is provided in 
an experiment by Gruber (1954), 
The standard stimulus was a tri- 
angle which was alternately 10 and 15 
centimeters in height. To the right of 
the standard was a variably sized 
triangle. This variable triangle was 
placed at six distances ranging from 
200 to 450 centimeters. For each 
distance O made four kinds of judg- 
ments, all of them under “look” 
instructions: 


1. O set the variable-size triangle so that 
it appeared equal in size to the standard (a) 
when the standard was half as far from O as 
the variable, and (b) when both stimulus ob- 
jects were equidistant from O. 

2. Oadjusted the distance of the standard 
so that it appeared (a) half as distant as the 
variable, and (b) equidistant with the vari- 
able. 


The results were all contradictory to 
the invariance hypothesis: 

1. By setting the size of the vari- 
able significantly larger than the 
actual size of the standard in the size 
constancy matches (judgments of 
Type 1a), Os exhibited a mean over- 
estimation of the relative size of the 
standard. However, a mean under- 
estimation of the relative distance of 
the standard occurred; O set the 
standard sized triangle too far away 
in the half-distance judgments. 

2. “Analysis of individual differ- 
ences revealed no correlation be- 
tween size and distance judgments. 
(Gruber, 1954, p. 426). 

3. As the physical distance of the 
farther object increased, the mean 
constant error in size constancy 
matches rose from 4% to 23%) 
whereas the mean constant error in 
half-distance judgments did not vary 
progressively with absolute distance, 
fluctuating around 17%. 

4. The mean errors in the control 
judgments (1b and 26) were not 
large enough to account for the 
magnitude of the errors in the size 
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constancy and half-distance judg- 
ments. 

By means of her size-distance 
equations, Gilinsky (1955b) at- 
tempted to show that Gruber's data 
are properly interpreted as support- 
ing rather than rejecting the hy- 
pothesis that perceived size is pro- 
portional to perceived distance. How- 
ever, Gruber (1956) has pointed out 
that Gilinsky’s analysis does not 
apply to his most interesting result, 
the Finding 1 above that an object 
which is consistently overestimated 
in size is consistently underestimated 
in distance. Gilinsky’s analysis deals 
only with Finding 3, and in order to 
do so, it must make use of a number 
of somewhat arbitrary assumptions. 

Jenkin and Hyman (1959) report 
that when Os are given an ‘‘objec- 
tive” set, Gruber's finding of a size- 
distance paradox is confirmed. Size 
judgments were obtained under four 
different distance conditions: (a) 
comparison 30 feet and standard 15 
feet from O, (b) comparison 30 feet 
and standard 2 feet from O, (c) com- 
parison 15 feet and standard 1 foot 
from O, and (d) comparison 15 feet 
and standard 15 feet from O. O made 
size judgments under two different 
instructions: to match for physical 
size, and to match for retinal image 
size. Following the size judgments, 
the black mounting-board upon 
which the variable had appeared was 
placed 30 feet from O, who was re- 
quired to make estimates (in feet) of 
this distance. Under objective in- 
structions, Os either judged the 
variable as relatively small and its 
mounting as relatively remote, or as 
relatively large and its mounting as 
relatively near. 


The relationship of analytic size-judgments 
to estimated distance was toward the distant 
object being regarded as relatively large and 
relatively remote, or relatively small and 
relatively near (p. 73). 


Thus we have the paradoxical result 
that an O who is set to judge physical 
size responds as if he were ignoring 
the simple geometrical help which 
would come from taking distance into 
account, while a person who is de- 
liberately trying to ignore distance in 
order to get retinal image matches 
responds as if he were taking distance 
into account. Assuming that the 
analytic judgments represented O's 
best attempt to respond in terms of 
retinal image size, and assuming that 
objective size judgments represent 
perceived size, the invariance hy- 
pothesis demands, for any given dis- 
tance, a positive correlation between 
analytic size judgments and objective 
size judgments. Such a correlation 
was not found. 

Heinemann, Tulving, and Nach- 
mias (1959) obtained nonmatching 
size and distance judgments in an 
experimental situation in which O 
was permitted only primary, monocu- 
lar cues to distance. When distance 
judgments were being made, the 
comparison was held constant at 1° 
and O reported which of two succes- 
sively presented disks, standard or 
comparison, was farther away. When 
judging the relative distance of a 
standard and a variable, most Os 
said that the objectively nearer disk 
was farther away. Since the far ob- 
ject looked nearer than the near 
object (which subtended the same 
retinal angle), it should have been 
judged as smaller than the near ob- 
ject, if the invariance hypothesis is 
true. Yet size matches were con- 
sistently “in the direction of size 
constancy”; the farther away an 
object was, the larger it was judged 
as being. 

Kilpatrick and Ittelson (1953) 
have drawn attention to two phe- 
nomena of accommodation which 
involve a size-distance paradox. 
They cite Aubert’s finding that 
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partial paralysis of accommodation 
produces both a reduction in the ap- 
parent size of an object and an in- 
crease in its apparent distance. They 
also report von Kries’ observation 
that an object appears to diminish 
in size and also to recede when O 
shifts fixation from that object to one 
closer by. However, both these find- 
ings are complicated by the fact that 
changes in accommodation involve 
changes in retinal image size (e.g., 
Pascal, 1952). 

Nonproportional Results of Class 2. 
A number of studies indicate that 
when visual angle is constant, changes 
in apparent size are not consistently 
accompanied by changes in apparent 
distance, and changes in apparent 
distance are not consistently ac- 
companied by changes in apparent 
size. Beginning with the classic 
experiments of Wheatstone and Judd, 
it has been frequently found that in- 
creases in convergence are regularly 
accompanied by decreases in ap- 
parent size. Insofar as the decrease 
in retinal image size accompanying 
convergent accommodation is not 
sufficient to account for the obtained 
decrease in apparent size, the in- 
variance hypothesis requires that the 
decrease in apparent size be ac- 
companied by a decrease in apparent 
distance. Yet the obtained changes 
in apparent distance are equivocal. 
This result was corroborated recently 
by Hermans (1954), who used a 
telestereoscope to produce six changes 
in convergence from 0 to 10°. As 
degree of convergence increased, the 
mean apparent size of the standard, 
as determined by O's adjustment of a 
variable, decreased significantly. 
Verbal reports indicated that some 
Os perceived a decrease in distance 
with increasing convergence, while 
other Os perceived an increase in 
distance. 


Kilpatrick and Ittelson (1953) 
found that an illusory movement in 
depth was not accompanied by the 
required change in apparent size, 
The trapezoidal window was sus- 
pended in O’s line of sight, so that its 
sides were vertical and the physically 
larger end of the trapezoid was far- 
ther from O than the smaller end. 
An ordinary playing card and a piece 
of cotton were successively moved 
through an opening in the window by 
means of a thread stretched at right 
angles to the line of sight. Objects 
carried through the trapezoid in a 
straight path by the moving thread 
appear to move through an S shaped 
path in the horizontal plane. In the 
majority of observation trials, Os 
perceived definite movement in depth 
of 2 feet. Yet for the largest number 
of trials on which movement in depth 
was perceived, no size changes were 
reported either for the playing card or 
for the cotton. On the remaining 
trials on which movement in depth 
was reported, size changes in a direc- 
tion opposite to that required by the 
invariance hypothesis were reported 
about half as frequently as changes 
in the required direction. Ina second 
experiment, an ordinary sized playing 
card was suspended from each of the 
two stationary wires by means of 
which the trapezoid was hung from 
the ceiling. On 19 trials Os perceived 
one card to be larger than the other. 
But on only 10 of these 19 trials did 
Os perceive the apparently larger 
card to be farther away, as required 
by the invariance hypothesis. 

According to the invariance hy- 
pothesis, an improvement in O’s 
ability to estimate the distance of an 
unfamiliar object should result in an 
improvement in his ability to esti- 
mate its size. Using a series of photo- 
graphs of the Gibson size-at-a-dis- 
tance set-up (described above), Gib- 
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son and Smith (1952) found that 
training in estimation of the dis- 
tances of the stakes in the photo- 
graphs significantly improved O's 
accuracy in estimating these dis- 
tances. However, there was no 
significant improvement in O's abil- 
ity to estimate the sizes of the stakes. 

Another finding contrary to the 
invariance hypothesis involves the 
visual tau effect (cf. Geldreich, 1934). 
Kilpatrick and Ittelson (1953) note 
that the difference in the perceived 
lateral separation of the points is not 
accompanied by any change in the 
apparent distance of the pairs of 
points from 0. 

Matching Judgments of Size and 
Distance. We have seen that most 
experiments which obtain size and 
distance judgments in the same set- 
ting provide evidence against the 
invariance hypothesis. However, in 
two experiments in which conver- 
gence provided the chief distance 
cue, matching size and distance 
judgments were obtained. 

Bleything (1957) used a stereopro- 
jector to cast two ring targets onto a 
screen. Observer and projector were 
equipped with polaroid filters making 
it possible for one ring to be seen 
with one eye only and the other ring 
to be seen with the other eye only. O 
saw a single fused ring which ap- 
peared to approach and recede in 
depth as E varied the distance be- 
tween the center of the projected 
rings. As required by the invariance 
hypothesis, the apparent size of the 
fused ring increased with apparent 
distance, although the perceived size 
of the ring increased at a slightly 
greater rate than predicted by the 
formula, s=(a)(d). 

Roelofs and Zeeman (1957) report 
that when retinal image size is con- 
stant, a number of variables which 
affect apparent size also produce a 


complementary change in apparent 
distance. Two series of figures were 
presented. In the first series each 
card bore six figures: two pairs of 
equal sized circles which were fused 
binocularly (orthoptically) to give 
two perceived circles, and two circles 
which were presented either to the 
right or left eye alone. For the first 
series, Roelofs and Zeeman report 
the following findings: 

1. Of the two circles seen binocu- 
larly, the one which required the 
greater convergence always appeared 
smaller, As required by the invari- 
ance hypothesis, it also always ap- 
peared closer to O. 

2. The apparent size of the circles 
seen monocularly tended to be inter- 
mediate between the apparent sizes 
of the two circles seen binocularly. 
Matching this, the apparent distance 
of the monocularly seen circles tended 
to be intermediate between the ap- 
parent distances of the binocularly 
seen circles. 

3. For a given card, the apparent 
size of a monocularly seen circle was 
closer to the apparent size of the 
binocularly seen circle from which it 
had the smallest physical separation 
on the card. As required, the ap- 
parent distance of the monocularly 
seen circle was also closer to the ap- 
parent distance of the nearby, bin- 
ocularly seen circle. 

4. The apparent size of the circles 
seen monocularly was just as strongly 
influenced by the circles seen bin- 
ocularly with a weaker convergence 
as by the circles seen binocularly with 
a stronger convergence. However, 
the apparent distance was more 
strongly influenced by the circles 
seen with a stronger convergence. 
This is the only general finding of 
Roelofs and Zeeman which contra- 
dicts the invariance hypothesis. 

5. Monocularly seen circles in the 
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lower half of the stimulus card tended 
to be perceived as smaller than and, 
matching this, as nearer than mo- 
nocularly seen circles in the upper half 
of the card. 

6. Monocularly seen circles in the 
nasal position tended to be seen as 
smaller than and as closer than mo- 
nocularly seen circles in the temporal 
position. 

7. Monocularly seen circles in the 
left half of the optic field tended to 
be seen as smaller than and as closer 
than monocularly seen circles in the 
right half of the field. 

The second series of stimulus cards 
used by Roelofs and Zeeman had 
three equal sized circles: a single 
circle which was presented to one eye 
only and a pair of circles which were 
fused binocularly to give a single 
perceived circle. For the second 
series, the apparent size of the circles 
seen binocularly was greater than the 
apparent size of the circles seen mo- 
nocularly. Matching this, the ap- 
parent distance of the binocular 
circles was greater than the apparent 
distance of the monocular circles. 
The findings obtained with the earlier 
stimulus series were corroborated 
with respect to the effects on apparent 
size and distance of nasal vs. tem- 
poral, right vs. left, and higher vs. 
lower stimulus positions. 

Although the general findings of 
Roelofs and Zeeman are in accord 
with the invariance hypothesis, there 
were some individual exceptions to 
the required matching of size and 
distance judgments for all findings 
except the first. 

In at least one respect, the experi- 
ments of Bleything (1957) and of 
Roelofs and Zeeman (1957) provide a 
fairer test of the invariance hypoth- 
esis than do the experiments which 
produce nonmatching judgments. 
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Bleything, and Roelofs and Zeeman 
had O estimate the size and distance 
of the stimulus almost simultaneously. 
In the other experiments a relatively 
long temporal interval separated the 
estimations. It is possible that when 
O is asked to make adjustments of 
size (distance), his perception of size 
(distance) occupies the center of 
attention, and his perception of 
distance (size) is relegated to the 
background. The perception of both 
size and distance when they are 
merely registered as background may 
differ from their perception when 
they occupy the observer's close at- 
tention. Hence, when O is set to 
perceive size and distance at the 
same time, it is more likely that his 
judgments will match as required by 
the invariance hypothesis than when 
he is set to perceive only size or dis- 
tance and not both. 

Despite the methodological reser- 
vations mentioned immediately 
above there is sufficient cause for 
concluding that all is not well with 
the traditional formulation of the 
size-distance relationship. It remains 
to be seen whether the generally ac- 
cepted invariance hypothesis can by 
any means be reconciled with the 
contradictory findings described in 
this section. In the eventuality 
that this reconciliation will prove 
impossible, then the way is open for a 
restatement of the size-distance rela- 
tionship. It is also possible that in 
certain instances size and distance 
perception are unrelated. Despite 
their temporal co-occurrence these 
two experiences may be independent 
but simultaneous responses to sepa- 
rate aspects of the proximal stimulus 
situation. Some experimental evi- 
dence that this may indeed be the 
case has been presented by Gruber 
(1954) and Epstein (in press). 
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Tue Known Size-Arrarent 
Distances Hyroruesis 

According to this hypothesis the 
known size of a stimulus object 
determines a unique relation of retinal 
image size to apparent distance. 
Two corollaries can be derived easily 
from this proposition: 

Corollary 1. Discrete changes in 
the size of the retinal image of an ob- 
ject whose known size remains con- 
stant will be perceived as correspond- 
ing changes in the apparent distance 
of that object. 

Every identified object may be 
said to possess an "assumed size.” 
This term refers to “the entirely 
subjective sense of size which the 
observer might relate toa specifically 
characterized physiological stimulus- 
pattern” (Hastorf, 1950, p. 195). 
The second corollary deals with 
assumed size. 

Corollary 2. Changes in the as- 
sumed size of an object whose retinal 
size remains constant will result in 
appropriate changes in the apparent 
distance of that object. 


Corollary 1 

Most of the investigations which 
have been reported are concerned 
with Corollary 1. An ingenious ex- 
perimental test of this proposition 
which has been cited often was per- 
formed by Ittelson (1951). In one 
experiment three playing cards were 
presented singly to O under condi- 
tions of complete reduction. Each 
of the cards was placed at the same 
physical distance from 0. The task 
for O was to adjust a comparison 
stimulus of familiar size, which was 
presented separately, until the com- 
parison object and the standard play- 
ing card appeared to be at the same 
distance. The neat turn in this ex- 
periment concerns the sizes of the 
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perceived as changes in distance and 
as changes in size. The larger 
should be localized at a point — 


distinguish between familiar size, on 
the one hand, and the relative size of 
the stimuli on the other (i.c., change 
or difference in size of objects of 
For this reason, 


separated. Two figures were drawn 
on a two-dimensional, reversible 
screen drawing. One panel contained 
a drawing of a man, and on the other 
panel a boy of the same size and ap- 
proximate contour was represented. 
The question is whether the panel 
with the boy appears to be nearer 
more often than the panel containing 
the man. This is to be expected if 
familiar size is determining apparent 


localization. The results showed that 


familiar size was ineffective in this 
situation. 
In a second experiment the effec- 
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tiveness of relative size was tested. 
The same procedure was followed 
with one difference. Whereas the 
first experiment held relative size 
constant while familiar size was 
varied, the second experiment held 
familiar size constant while varying 
relative size. Both panels contained 
drawings of the same boy, but one 
was a reduced version of the other. 
Here, relative size would lead to 
localizing the panel containing the 
larger boy nearer than the other 
panel. The results were in agreement 
with this expectation. These findings 
led the authors to suggest that there 
may be a stimulus bound correlation 
between retinal size and perceived 
distance which would make the intro- 
duction of unconscious assumptions 
(about known size) unnecessary. 

Further experimental evidence in 
support of this emphasis on relative 
size is presented by Hochberg and 
McAlister (1955). Four cards, each 
bearing one small figure and one 
large figure were presented singly. 
Card 1 bore a large circle and a small 
circle; Card 2, a large square and a 
small square; Card 3, a large circle 
and a small square; and Card 4, a 
large square and a small circle. In 
terms of relative size, it would be 
expected that Cards 1 and 2 should 
yield more three-dimensional re- 
sponses than Cards 3 or 4. This was 
the case. 

In a second experiment the authors 
inquired whether the direction of the 
three-dimensional responses is in 
accordance with what would be pre- 
dicted in terms of relative size. 


In terms of the cue of relative size the larger 


figure should appear nearer than the small one - 
in Cards 1 and 2. They did. If this were due. 


to the operation of familiar size, we would 
expect similar results to hold with respect to 
Cards 3 and 4 (p. 296). 
This did not happen. 

Ittelson (1953) has replied to the 
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above criticisms by citing several 
instances in which relative size is not 
involved. These are cases when only 
a single object is present in the field. 
Ittelson argues that if a single, fa- 
miliar object viewed monocularly in a 
dark room is replaced by another of 
the same physical size, but of differ- 
ent assumed size, the apparent dis- 
tance of the second will be different 
from the first. The clearest demon- 
strations of this effect have been 


Ames’ ‘‘watch-card-magazine” ex- 
periment (1946-47) and Hastorf's 
similar investigations (1950). We 


will describe Hastorf’s study later in 
this section when we consider Corol- 
lary 2. 

In addition Ittelson (1953) main- 
tains that if a single, familiar object 
is viewed monocularly in a dark 
room, it is perceived immediately and 
unequivocally at some definite dis- 
tance which can be correctly pre- 
dicted on the basis of the familiar 
size of the object. Finally, the claim 
is made that the size-distance per- 
ceptions related to a given stimulus 
can be changed by immediately prior 
experiences which change the size 
which is attributed to the stimulus. 
As an illustration Ittelson cites the 
experiments which demonstrate the 
influence of size assumptions on 
perceived radial motion (see Kil- 
patrick & Ittelson, 1951). h 

The latter two assertions are in- 
compatible with an explanation based 
on the relative size cue. However, 
subsequent investigations have failed 
to confirm their validity, and have 
provided further support for the rela- 
tive size thesis (also see Hochberg & 
Hochberg’s—1953—rejoinder to It- 
telson). The experiments reported by 
Gogel, Hartman, and Harker (1957) 
show that the retinal size of a familiar 
object is totally inadequate as a cue 
for the absolute apparent distance of 
that object. The investigations re- 
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ported by Epstein (in press) confirm 
the findings of Gogel et al. and also 
demonstrate that experiences which 
modify Os assumptions concerning 
object size do not modify his percep- 
tual experience. The problem for 
Gogel et al. (1957) was to “investi- 
gate whether the retinal subtense of 
a familiar object can act as a deter- 
miner of the apparent absolute dis- 
tance of that object from the ob- 
server” (p. 1). This study employed 
a nonvisual method of measuring per- 
ceived distance of the object. O was 
asked to throw a dart to the per- 
ceived distance without seeing the 
results of the throw. Since successive 
throws might involve relative dis- 
tance judgment, only the response to 
the object which was first perceived 
was considered in measuring the per- 
ceived absolute distance of that ob- 
ject. The stimulus object was a nor- 
mal or double sized playing card, 
located at a distance of 10 or 20 feet 
in a reduced cue situation. 

The distance responses for the stim- 
uli initially presented did not confirm 
the expectations which follow from 
the Known Size-Apparent Distance 
Hypothesis. Not only did the results 
fail to agree with any precise predic- 
tions of apparent localization, e.g., 
the double sized card at a physical 
distance of 20 feet should be localized 
at 10 feet, but the less stringent pre- 
diction, e.g., the double sized card 
should appear to be nearer than the 
normal card, was also not confirmed. 
Under these conditions perceived dis- 
tance was totally unrelated to retinal 
size. 

When a similar analysis was per- 
formed for all of the four reduced cue 
situations collectively (i.e., the same 
Os in all four situations), partial sup- 
port was obtained for the Known 
Size-Apparent Distance Hypothesis 
in its less precise formulation. The 
implication of this finding is clear. 


The secondary analysis shows only 
that relative distance a 
some function of relative retinal sub- 
tense, can occur for successively pre- 
sented stimuli, 

The first of three experiments re- 
ported by Epstein (in press) was es- 
sentially a replication of Ittelson's 
(1951) experiment with two major 
modifications: (a) prior to the judg- 
mental task Os in the Experimental 
Group participated in a card game 
which was designed to modify their 
assumption concerning the normal 
size of cards, and the constancy of the 
physical size of cards, (6) at the con- 
clusion of the distance settings all Os 
were required to judge the apparent 
size of the stimuli. 

The results of this experiment did 
not support the known size h 
sis. Despite the modifying treatment 
experienced by the Experimental 
Group there was no difference be- 
tween the distance judgments of the 
Experimental Group and a Control 
Group which did not have prior train- 
ing. In addition, none of the distance 
judgments met the precise quantita- 
tive requirements of the known size 
thesis, e.g., while the quarter sized 
card appeared to be more distantly 
located than the normal card, it was 
not set at four times the distance of 
the normal card. Finally, the stimuli 
of different physical size were also 
judged to be of different size. 

In Experiment II it was demon- 
strated that similar apparent dis- 
tance effects would obtain when only 
relative retinal size is operative 
(known size and assumed constancy 
of physical size absent). Finally, in 
Experiment III it was shown that in 
the absence of the relative size cue no 
systematic size-distance effects are 
obtained. The results of Experiments 
Il and III bolster the position adopted 
by Hochberg and Gogel. 


In this connection the results re- 
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ported by Gogel and Harker (1955) 
may also be cited. Gogel and Harker 
obtained judgments of apparent dis- 
tance for two playing cards of differ- 
ent sizes under reduced cue and near 
complete cue conditions. They found 
that the relative apparent depth of 
the two cards was a function of the 
lateral separation between the two 
cards. They concluded that “the 
effectiveness of size cues to relative 
depth increased as the lateral separa- 
tion of the differently sized cards was 
increased” (p. 315). There is no 
reason to expect such results if the 
original depth effects were based on 
the operation of an assumed size fac- 
tor. 

This review leads to the conclusion 
that despite its reasonableness Corol- 
lary 1 of the Known Size-Apparent 
Distance Hypothesis is unnecessary. 
Many of the experimental effects 
which are most frequently cited as 
evidence for its validity are more 
simply attributed to other factors, 
e.g., relative size. In those cases in 
which these factors are eliminated 
the “Known Size Effect” is also elim- 
inated. The question remains whether 
all reported effects of known size on 
apparent localization can be ex- 
plained in this way. This brings us to 
Corollary 2 of the Known Size- 
Apparent Distance Hypothesis. 


Corollary 2 


The second corollary requires that 
a single object whose physical size 
remains unaltered will undergo 
changes in apparent spatial localiza- 
tion with changes in the physical 
size which O attributes to the object. 
Thus, if the same object is assumed 
by O to have a small size at one time, 
and a large assumed size at a later 
time, it will be perceived to be more 
distant at this later time although the 
physical distance of the object is the 
same at both times. It is obvious 
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that effects of this nature cannot be 
accounted for by processes which de- 
pend on the opportunity for compar- 
isons of successively presented stimuli 
which differ along a physical dimen- 
sion. 

There are very few experimental 
studies which demonstrate that such 
an effect does indeed obtain. In 
Hastorf’s (1950) investigations a rec- 
tangular or circular area of light was 
given a ‘‘large assumed size meaning” 
or a ‘‘small assumed size meaning.” 
That is, the rectangle was called either 
an envelope or a calling card, and the 
circle was called either a billiard ball 
or a ping-pong ball. The size at which 
the stimulus was set, in order to ap- 
pear at a specific distance, varied 
when the assumed size attributed to 
the stimulus was varied by the size 
suggestion, i.e., by naming the stim- 
ulus. 

In a study of the effects of past 
experience on apparent size, Smith 
(1952) reported findings which may 
be interpreted in the same way. In 
the first stage of the experiment O 
judged the apparent distance of sev- 
eral simple geometrical forms, e.g., 
circles and squares. Then, over a 
period of 2.5 weeks Os participated in 
a series of tasks requiring the manip- 
ulation and discrimination of geo- 
metrical forms of the same shape but 
larger in size. In this way E hoped 
to alter the attributed size of the orig- 
inal forms. Then the Os were re- 
tested, i.e., Os repeated the judgments 
which were made prior to training. 
The distance judgments were 0b- 
served to change in the direction de- 
manded by the modification in at- 
tributed size. 

Finally some incidental findings of 
Ittelson (1951) may be mentioned. 
In one variation of the experiment 
described earlier O judged the ap- 
parent distance of a half sized playing 
card and a matchbox of identical size 
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when both were located at the same 
objective distance of 7.5 feet. The 
playing card was localized at a dis- 
tance of 14.99 feet while the match- 
box was judged to be at a distance 
of 8.96 feet. Apparent distance was 
influenced by Os assumptions con- 
cerning the physical size of the stim- 
ulus objects. 

Corollary 2 has received support 
from the investigations described 
above. Still, there is clearly a need for 
further experimentation. In partic- 
ular it would be useful to have the 
results of experiments which meet the 
following three requirements: 

1. A measure of O's immediate per- 
ceptual impression should be ob- 
tained. In most cases O has been 
allowed an extended period of time in 
which to make an adjustment which 
he is “satisfied with.” Under such 
conditions many judgmental and at- 
titudinal factors may enter into the 
adjustment process, and contaminate 
or at least alter the identity of the 
effect. 

2. Different Os should be used for 
the various attributed size conditions. 
It is possible that the same O perform- 
ing under the various conditions may 
be making memorial comparisons be- 
tween the first attributed size-ap- 
parent distance judgment and the 
requirements of the current situation. 
This possibility is minimized if an ex- 
tended temporal interval intervenes 
between the required judgments. 
Nonetheless, even though 6 days 
intervened between successive critical 
judgments in Hastorf’s experiments, 
Hastorf (1950) reports that ‘some 
subjects did appreciate the fact that 
it was the same stimulus objects 
being given two different names” (p. 
208). 

3. In addition to these two require- 
ments it might be helpful to obtain a 
measure of apparent size independ- 
ently of O’s distance judgments. The 


results of earlier experiments suggest 
that such information may be instruc- 
tive. 


THE RELATIONSHIP BETWEEN THE 
Size OF THE AFTERIMAGE AND 
Distance 


A special case of invariance is 
Emniert's Law. The law states that 
the size of a projected afterimage (Al) 
is directly proportional to the dis- 
tance from the eye to the projection 
surface. This statement follows from 
simple geometric considerations if we 
keep in mind that for the case of Als 
the subtended visual angle remains 
constant regardless of variations in 
projection distance. The apparent 
simplicity disappeared following Bor- 
ing’s (1940) well-known attempt to 
demonstrate that Emmert’s Law im- 
plies its converse, size constancy. 
Boring’s thesis has been expressed 
succintly by Edwards (1950): 

What Boring was saying was that apparent 
size must increase with constant retinal size 
and increasing distance, if it is also true that 
apparent size remains constant with shrinking 
retinal size and increasing distance (p. 611). 


We will not review the logic of Bor- 
ing’s formulations. It will suffice to 
point out that these formulations 
hinge on Boring’s substitution of ap- 
parent size for physical size in the 
optical geometry of Emmert’s Law. 
This substitution has been strongly 
criticized by Young (1950). Never- 
theless, Boring’s thesis has stimulated 
the major portion of writings con- 
cerned with Emmert’s Law in the 
last 10 years. This work has followed 
two main themes. 


The Historical Issue 


Young (1950, 1951) has contended 
that Emmert intended to deal only 
with nonpsychological, Euclidian op- 
tical relationships. The contention is 
that Emmert’s original reference 
(1881) was to the physical size of the 
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AI as determined by direct physical 
measurement of the occluded area on 
the projection surface. Young also 
maintains that a fundamental dif- 
ference exists between real objects 
and Als, and that it is inappropriate 
to speak of the apparent size of the 
latter. 

The opposing view holds that Em- 
mert was either concerned directly 
with apparent size and had, himself, 
implicitly made the substitution of s 
for p for the special case of Als (Ed- 
wards, 1950), or that he did not dis- 
tinguish the two different meanings 
of size perception (Boring & Edwards, 
1951). The determination of apparent 
size requires a comparative technique. 
This method usually takes the form 
of judging the size of the critical ob- 
ject on the basis of an adjustable 
comparison stimulus or a series of 
different sized,stimuli. These, gen- 
erally, are separated both in the 
lateral and frontal plane from the 
critical object. This method has 
found wide application in research on 
size constancy where apparent size is 
the crucial dimension. 

Despite a careful reading of Em- 
mert’s original article (1881) there is 
little that we can contribute toward 
a resolution of this historical issue.‘ 
The one experiment which Emmert 
described in detail did utilize com- 
parative stimuli, but both were at- 
tached directly to the projection sur- 
face. We are inclined to agree with 
Boring and Edwards (1951) that 
Emmert, in his own research, was not 
making a clear distinction between 
physical and apparent size. 


The Theoretical Issue 


The second aspect of the contro- 
versy is of greater significance. If 


4 We are indebted to Martin Scheerer of the 
Department of Psychology of the University 
. of Kansas for his expert translation of Em- 
mert’s article. 


Emmert's Law and size constancy are 
derivable from the same processes, 
then those conditions which deter- 
mine the perceived size of real objects 
should affect the size of the Al also, 
If communality of process is not the 
case then the size of the AI should be 
unaffected by the same variables 
which affect the perceived size of real 
objects (or at least the effects should 
not be identical). 

Edwards (1950) suggested that an 
experimental decision on this matter 
depends in part on the selection of an 
appropriate method of measurement. 
E can adopt either of two methods: 
(a) indirect measurement by employ- 
ing a comparison stimulus or (b) direct 
measurement on the plane of projec- 
tion. Edwards predicted that under 
reduced cue conditions Emmert's 
Law would fail when measured by 
Method a (i.e., the size of the AI 
would remain constant with increas- 
ing distance) but would hold when 
measured by Method 6. Much of 
Edwards’ position had been stated 
earlier by Helson (1936). In this 
paper Helson interpreted his results 
as showing that: 
when cues to distance and surroundings are 
eliminated the apparent size remains practi- 
cally constant while the measured size of the 


projected image tends to obey Emmert’s 
Law (p. 638). 


Edwards (1953) tested one aspect 
of this prediction, viz., that the ap- 
parent size of the AI when measured 
by the comparison method would not 
conform to Emmert’s Law under re- 
duced cue conditions. Os projected 
Als monocularly on to a dimly illu- 
minated screen while looking through 
a reduction tube. The distance of the 
projection screen varied in five steps 
from 42 to 90 inches. A 2-inch lumi- 
nous square in the same reduced field 
was adjusted until it appeared equal 
in size to the AI. No significant dif- 
ferences between the various dis- 
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tances were obtained. Edwards con- 
cluded that Emmert’s Law (ie, 
“Emmert's Law of Apparent Size") 
had failed under reduction conditions. 
However, as Edwards himself admits, 
it is somewhat tenuous to uphold a 
prediction on the basis of confirma- 
tion of the null hypothesis. 

Hastorf and Kennedy (1957) also 
contend that the controversy con- 
cerning the relationship of Emmert’s 
Law to size constancy is primarily a 
matter of the type of measurement 
used. Under reduction and nonreduc- 
tion conditions Os judged the size the 
real objects and Als at various dis- 
tances by the comparison method and 
the direct method (bracketing spot- 
lights). The results for the compar- 
ison method confirmed Edwards’ 
position, i.e., in the reduced cue situa- 
tion, size constancy was greatly de- 
creased and Emmert’s Law did not 
obtain. With direct measurement 
there was no significant difference in 
the size of the AI between the re- 
duced and full cue situations. This 
outcome supports Young's position. 
Thus, both sides of the controversy 
received support as did the authors’ 
contention that the controversy 
hinges on different measurment tech- 
niques. However, Hastorf and Ken- 
nedy also reported that the use of 
bracketing spotlights in a dark room 
might provide a distance cue. If this 
is true, then it must be concluded 
that the direct measurement of the 
physical size of the AI under authen- 
tic reduction conditions remains to be 
accomplished. 

Crookes (1959) takes a somewhat 
different approach to the problem of 
measurement. Crookes agrees with 
Young (1950, 1951) that Emmert’s 
Law concerns “‘real,’’ not apparent 
size. Further, he proposes that if Bor- 
ing (1940) is right, Emmert’s Law and 
size constancy should hold equally 
well when apparent size matches are 


obtained under the same conditions. 
Using the comparison method under 
“analytical” instructions, Le., stress- 
ing retinal size, Os matched Als and 
real objects. Crookes found that O 
made significantly better matches 
i.e., showed significantly more con- 
stancy) in the case of the real objects. 
Crookes concludes that the subsump- 
tion of Emmert’s Law and size con- 
stancy under a common heading is 
not justified. However, the objection 
could be raised that the analytical 
attitude induced by the instructions 
does not suit the purposes of research 
on constancy phenomena. Also, there 
is some question whether the greater 
constancy in the case of the real ob- 
jects might not be due to the rela- 
tively greater ease of viewing real 
objects. 

These studies concerned with the 
relationship between Emmert’s Law 
and size constancy are not unanimous 
in their conclusions. Nevertheless, it 
is generally conceded that the method 
of measuring the Al may be critical. 
Thus, we might expect two or more 
forms of Emmert’s Law to emerge, 
each embodying its own mode of 
measurement and each bearing a dif- 
ferent relationship to other size-dis- 
tance phenomena. 

New approaches to measurement 
should be tried in this context, espe- 
cially those promising some increase 
in precision. For example, Onizawa 
(1954) has developed a method 
whereby a screen bearing a compar- 
ison stimulus moves away from 0, 
while a projection screen bearing an 
AI moves toward O. When O per- 
ceives equality between the AI and 
the comparison stimulus, he stops 
this movement. Ratios based on the 
respective distances of the two screens 
from O are compared with like ratios 
predicted from Emmert’s Law. Qni- 
zawa presents data which indicate 
that his technique incurs less vari- 
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ability than the method of directly 
measuring the AI on the projection 
surface. 

However, before the role of differ- 
ent measures can be clearly evaluated 
it will be necessary to test them to- 
gether under identical conditions 
(e.g., reduction conditions). This re- 
quires that a given measure must not, 
itself, disqualify such conditions. 
Hastorf and Kennedy’s (1957) obser- 
vation (i.e., spotlights provide dis- 
tance cues under reduced conditions) 
illustrates this problem. 

Another matter deserving com- 
ment is related to the hybrid nature 
of Boring’s formulation. While Bor- 
ing has substituted apparent size for 
real size he has not seen fit to substi- 
tute apparent distance for physical 
distance. A careful reading of Bor- 
ing’s discussion (1942, p. 292) reveals 
a confusion of physical distance with 
apparent distance. The two terms 
are used interchangeably, seemingly 
without regard for any differences 
which may exist. It would be inter- 
esting to obtain pairings of the ap- 
parent size of the AI with the appar- 
ent distance of the projection surface. 
Such relationships if found to con- 
form to Emmert’s Law could hardly 
be explained in terms of the require- 
ments of Euclidian geometry which 
applies only to physical distances and 
extents. 

In this regard an additional com- 
plicating factor has been described by 
Ohwaki (1955). While expected 
values of Emmert’s Law have been 
based on retinal size arising from the 
physical size and distance of the fixa- 
tion object, Ohwaki (1955) found that 
perceived, not physical, distance was 
crucial in determining retinal size. 
Perceived distance was effective with 
either ordinary distance cues or past 
experience available. The interpreta- 
tion was offered that it is perceived 
distance which underlies accommoda- 


tion. Accommodation in turn regu- 
lates the size of the retinal image. 

Finally, the problem of the phys- 
ical as opposed to apparent size of 
the fixation object should be men- 
tioned. Although this problem has 
received recent treatment in studies 
of figural aftereffects, its relevance 
with respect to Emmert’s Law has 
not been explored. 

It seems obvious that a refined 
statement of Emmert’s Law must 
await intensive treatment of the 
variables discussed above (i.e., ap- 
parent distance of the projection sur- 
face, apparent size and distance of 
the fixation figure). 


Other Determinants of the Size of the 
Afterimage 

In a series of experiments, Young 
investigated the effect of a number of 
additional variables on the size of 
projected Als using spotlights to 
outline the AI on the projection plane. 
In one study Young (1952a) varied 
the exposure time of the stimulus ob- 
ject in seven steps ranging from 0.01 
to 40.0 seconds. No significant varia- 
tions in the size of the AI were found 
with variations in stimulation time. 
Young (1952b) also investigated sev- 
eral features of the projection ground. 
In one experiment the illumination on 
the projection ground was varied 
through five log steps. No variation 
in the size of the AI was found. An- 
other experiment (1952b) utilized 
pictures containing strong linear per- 
spective. Als were projected to speci- 
fied points on these pictures and com- 
pared with Als projected to similar 
points on a blank screen. The sur- 
faces with linear perspective were 
found to influence AI size. It is 
tempting to account for these re- 
sults by referring to presumed 
changes in apparent distance result- 
ing from the differences in geometric 
perspective. Unfortunately this in- 
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terpretation is complicated by the 
finding that there was little agree- 
ment between the Os in degree or 
direction of the size effect. However, 
an earlier study by Frank reported by 
Koffka (1935, p. 212) lends credence 
to an apparent distance interpreta- 
tion. Frank used a perspective draw- 
ing of a deep tunnel. Als projected 
to a phenomenally remote part of 
this tunnel were considerably larger 
than those projected to a near part. 
A similar effect is observed in the 
“Afterimage Demonstration” (Ittel- 
son, 1952, pp. 32-33). Appropriate 
adjustments of the interposition indi- 
cations using the overlay demonstra- 
tion apparatus (Ittelson, 1952, p. 13) 
produce changes in the apparent dis- 
tance of the projection surface and 
proportional changes in the apparent 
size of the Al. 

The final study in this series 
(Young, 1952c) concerned the effect 
of large distances. In daylight Als 
were projected in an open field to 
distances ranging from 25 to 1,250 
meters, In each case obtained values 
were less than those expected on the 
basis of Emmert's Law. The hypothe- 
sis was advanced that with a brighter 
fixation stimulus (a square with a 
luminance of approximately 1,700 mil- 
lilamberts), the retinal image is smal- 
ler, and consequently, the AI is smal- 
ler. 

An interesting sidelight to the type 
of research on Emmert's Law con- 
sidered so far is Oswald's (1957) study 
of the peripheral and central origins 
of Als. Oswald uses these terms to 
contrast Als in which the stimulation 
is confined to the retina with those 
involving the higher ‘‘representa- 
tive” or brain centers. He cites a 
number of investigations, including 
his own in which Als were obtained 
peripherally by presenting a light to 
an eye temporarily blinded by local 
pressure to the eyeball. Oswald also 


reviews a number of positive and 
negative reports of “central” Als fol- 
lowing imagined (visualized) objects 
or objects experienced in dreams. In 
his main experiment Os “imagined” 
crosses or squares and then projected 
Als to a screen at various distances. 
Most Os were able to achieve Als 
to imagined stimuli, However, very 
few Als conformed to Emmert’s Law. 
In this regard Oswald cites several 
earlier reports that eidetic Os deviate 
markedly from Emmert's Law when 
real stimuli are employed. 

With further reference to individual 
differences, Brengelman (1956) found 
deviations from Emmert's Law to be 
larger in his neurotic group than with 
normals and psychotics. 

Both large individual differences, 
such as those reported by Oswald, as 
well as smaller but consistent ones 
are inexplicable from the standpoint 
of a purely physical law, As an ex- 
ample of the latter kind, Young (1948) 

that all of his Os (N=5) 
yielded values falling consistently 
short of Emmert’s Law values by a 
small margin. One would expect that 
variations due to inaccuracies of 
measurement alone would be ran- 
domly distributed. 


ConcLupinG Discussion 


It seems to us that at least one 
compelling conclusion emerges from 
the survey we have just completed: 
the size-distance relationship ex- 
pressed in the several formulations of 
the invariance hypothesis should not 
be assigned a unique or primary status 
in explanations of space perception. 
We have seen that this is only one of 
the several possible and actual rela- 
tionships which are obtained. This 
need not cause any great consterna- 
tion to those who recall the origin of 
the hypothesis in Euclidean geo- 
metrical principles. Although the dis- 
tinction is sometimes overlooked it 
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should be clear that the invariance 
hypothesis is a psychological proposi- 
tion, and not a geometrical proposi- 
tion. By no stretch of the imagina- 
tion can Euclid’s principles be ap- 
plied directly to space perception. Of 
course, the analogy is plain and very 
tempting, and a successful transla- 
tion would have been a happy logical 
circumstance. Nonetheless, failure to 
accomplish this translation should 
not cause surprise. 

This brings us to a second remark. 
A great deal of logical and experi- 
mental analysis has been aimed at 
clarifying the term “size.” We now 
distinguish not only real physical 
size, apparent size, and retinal size, 
but also assumed size, apparent an- 
gular size, etc. Usually the investiga- 
tor makes explicit which aspect of size 
perception he is dealing with. How- 
ever, with regard to distance, there is 
often a confusion of physical distance 
and apparent distance. We have 
seen that there is no unequivocal 1:1 
relationship between physical dis- 
tance and apparent distance. There- 
fore, it is not clear how experimental 
investigations of the size-distance 
relationship are to be interpreted 
when apparent distance judgments 
are not obtained. It seems to us that 
all studies of size and distance should 


obtain paired 
ments. 

This brings us to the methodo- 
logical point which we mentioned 
earlier. Almost all of the experiments 
which have obtained paired size-dis- 
tance judgments (including Epstein, 
in press) have done so in a successive 
judgment situation. We have already 
indicated the reasons for our dissatis- 
faction with this procedure. Here we 
wish only to reiterate the desirability 
for future investigation which em- 
ploys a simultaneous judgment tech- 
nique. 

Finally, we wish to endorse a com- 
ment made earlier by Kilpatrick and 
Ittelson (1953) concerning individual 
differences. In order to assess the 
generality of the various size-distance 
hypotheses we need to look more care- 
fully at the results of the perform- 
ances of individual Os. In repeating 
some of the published research the 
first author has often been struck by 
the degree of interobserver and intra- 
observer variability. Results con- 
firming various aspects of the invar- 
iance hypothesis do not allow E to 
say much about the individual O. In 
view of the “lawfulness” which is 
usually ascribed to the invariance 
hypotheses this extreme variability 
cannot be overlooked. 


size-distance judg- 
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